Contents 



Preface viii 
Introduction 1 

Introduction. Reliability Planning. Definition of System Reliability. 
Overall Target and Allocation to Subsystems. Reliability Modelling and 
Evaluation. Testing and Data Collection. Evaluation of Alternative 
Designs. Reliability Report. Scope of This Book. 

The Preliminaries 7 

Introduction. Sample Space. Events. Random Variables. Probability 
Laws. Expectation. Variance. Covariance. Moments. Coefficients of 
Skewness and Excess. Transform Methods. Some Special 
Distributions (Exponential, Normal, Log-normal, Weibull, Gamma). 
Stochastic Processes. Probability Distributions. Markov Chains. 
Equilibrium Distribution. Time Specific Behaviour. First Passage 
Times. Alternative Approach to First Passage Times. Continuous 
Parameter Markov Chains. Transient Behaviour. Equilibrium 
Probability Distribution. First Passage Times. Exercises. References. 

Frequency and Associated Concepts 63 

Introduction. Interstate Transition Rate. The Concept of Frequency. 
Time Specific Domain. Methods of Calculation. Steady State Domain. 
Time Specific Probabilities. Steady State Probabilities. Alternative 
interpretation of Mean Cycle Time, Mean Duration and Mean 
Frequency. The Relationship to Average Values. The Concept of 
Equivalent Transition Rate. References. 

System Reliability 89 
Introduction. Definition and Description of the System and its 
Requirements. Failure Modes and Effects Analysis. State Space 
Approach. Series System. Parallel Systems. Decomposition Using 
the Conditional Probability Approach. Network Approach. 
Network Reduction Procedure. Cut Set or Tie Set Methods. Tie 
Set Manipulation. Cut Set Manipulation. Frequency Calculation 
Using the Cut Set Approach. Algorithm to Determine Minimal 
Cut Sets. Exercises. References. 



YJ.1 



5 Techniques for Large Systems 132 

Introduction. The Problem Areas. Equivalent Transition Rate and 
Conditions of Mergeability. Components Subject to Fluctuating 
Environment. State Space Truncation. Sequential Truncation. 
References. 



6 Reliability Modelling in Non-Markovian Systems 164 

Introduction. The Difficulty with Non-Markovian Processes. Method 
of Supplementary Variables. Semi-Markov Processes. Device of 
Stages. References. 



7 Simulation 211 

Introduction. Basic Procedure. Random Number Generation. 
Simulation Model. Timing Controls. Random Sampling. Estimating 
Reliability Measures. Equilibrium Conditions and Sample Size. 
Variance Reducing Techniques. References. 



8 Conclusions 225 

9 Appendices 227 

Appendix I 227 
Solution of Simultaneous Linear Equations 

Appendix II 230 
Shape of the Hazard Rate Function of Two Series Stage 
Combinations in Parallel 

Appendix III 233 

Hazard Rate Shape of Series Stages in Series with a Distinctive 
Stage 

Appendix IV 234 

Series Stages in Series with Two Parallel Stages 

Appendix V 239 
Moments of Stage Combinations 

Appendix VI 242 
Calculation of the Jacobian Matrix 



Index 



245 



Preface 



The general area of reliability engineering is extremely wide and in fact 
encompasses all aspects of engineering technology. Conventional intuitive 
approaches to the evaluation of system adequacy are not sufficient in modern 
engineering applications and are gradually being replaced by consistent 
quantitative techniques. A basic and common requirement in any quantitative 
procedure is the development of a suitable mathematical model to describe the 
system. The model may be relatively simple or extremely complex and should 
be capable of numerical manipulation. This book is devoted entirely to this area 
and deals with the concepts, philosophy and techniques for reliability model 
building and evaluation. 

The book begins by outlining the elements of reliability planning and discusses 
the role of modelling in the reliability program plan. Chapter 2 reviews the basic 
probability theory required in subsequent chapters with emphasis on utilization 
in system reliability modelling and evaluation. The reader will find that some 
previous background in probability mathematics is helpful but not necessary. 
Chapter 3 is the key chapter in the book and discusses the concepts of the 
frequency balancing approach. These concepts have been used with considerable 
success in the reliability analysis of repairable systems. This chapter emphasizes 
the calculation of several measures of system reliability. Chapter 4 is concerned 
with determining the system reliability characteristics from the statistical 
information available on the failure and repair cycles of the constituent 
components. Non-maintained systems have been discussed extensively in the 
available literature and, therefore, this book is directed towards maintained 
systems. The theory and procedures are, however, quite general and can be 
equally applied to non-maintained systems. Chapter 5 is devoted to the 
utilization of these concepts in the reliability analysis of relatively large 
systems and some possible problem areas and solutions are presented. The 
available books generally assume constant transition rates and give a cursory 
treatment to non-Markovian models. Chapter 6 is devoted to non-Markovian 
modelling with special emphasis on the device of stages which is a very practical 
approach. The emphasis in the book is on direct analytical methods. A 
discussion of system, reliability modelling would, however, be incomplete 
without a discussion on simulation methods as given in Chapter 7. 

The scope of application of the concepts outlined in the book 
encompasses virtually all engineering disciplines. The book is therefore 
intended as a general treatise and is not aimed at any specific area of application. 

C. Singh 
R. Billinton 



CHAPTER 1 

Introduction 



Introduction 

System designers have always been concerned with the subject of reliability. The 
general approach has been, however, either intuitive or based on rule of thumb 
criteria derived from previous experience with similar systems. The intuitive 
approach has proved to be inadequate with the growth of complex military and 
industrial systems, where a composite of equipment, skills and techniques 
function as a unified entity. There has been considerable emphasis, in the past 
two decades, on the development of quantitative techniques and indices which 
respond meaningfully to the factors which actually affect the system reliability. 
Quantitative evaluation is achieved by building mathematical models which 
reasonably idealize the actual system and can be manipulated to obtain suitable 
measures of reliability. The role of reliability modelling and evaluation can be 
appreciated by examining the various stages of a general reliability program. 

Reliability Planning 

It is generally agreed that system reliability must be built in at the design stage 
of a project. The desired level of reliability can be achieved only by planning and 
implementing a good reliability program. A reliability program generally consists 
of the following elements: 

1 definition of system reliability 

2 overall target and allocation to subsystems 

3 reliability modelling and evaluation 

4 testing and data collection 

5 evaluation of alternative designs 

6 reliability report 

This sequence is not rigid and in fact many steps may have to be repeated in an 
iterative fashion. A brief description of these steps is as follows. 

1 Definition of System Reliability 

There are several definitions of reliability quoted in the literature but the one 
most often stated in textbooks is 'the probability that the system will perform 



2 System Reliability Modelling and Evaluation 



its intended function for a given period of time under stated environmental 
conditions'. This definition is, however, inadequate for many occasions and is 
restrictive in its scope of application. It is more appropriate to talk of 
quantitative measures which when compared with reference indices, indicate 
expected consistency with or deviation from the required performance. Several 
measures are discussed in detail in Chapter 3 and a brief review of these is given 
here. The measures may be time specific, i.e., functions of time, or steady state 
when they refer to the equilibrium conditions. The former are required when 
the analyst is concerned with the transient behaviour of the system and the 
latter while considering the average behaviour over a long time. 

It is usual in the literature to define reliability indices in terms of system 
success or failure. Many complex systems have, however, several levels of failure. 
For example, a large piece of complex equipment may not be simply working or 
not working but may have many possible output states. It is therefore 
appropriate to define the calculated reliability measures in terms of a subset X 
which may contain any number of system states. In particular applications, X 
can be referred to as success, failure or some other appropriate name. 

Time Specific Domain 

The following indices are commonly used for repairable systems in the transient 
domain: 

(a ) Time Specific Availability of Subset X + 

This is also called point wise availability or instant availability and is the 
probability of the system being in any state contained in X + at a particular 
instant of time t. 

(b ) Fractional Duration of Subset X + 

Also known as the interval availability, fractional duration of X , and is 
defined as the expected proportion of the interval (f. j9 f 2 ) spent in X . 

(c) Interval Frequency 

The interval frequency is defined as the expected or mean number of times 
the subset X + is encountered in the interval (f t , f 2 ). 

Reliability 

If the success and failure states are denoted by X + and X~, then reliability is the 
probability of being in X + at the t without having entered X~. The term 
reliability is used in many ways and most often in a qualitative sense to indicate 
concern regarding the ability of the system to perform its intended function. 
This approach is extended to qualitative appraisal where the term reliability is 



Introduction 3 



considered as an intrinsic system parameter which can be measured by various 
indices. The definition given above is more specific and reliability is considered 
as a mathematical quantity which is itself a measure. 

Steady State Domain 

(a) Steady State Availability of X + 

Commonly called availability, this is the limiting value of both point wise 
availability and fractional duration. This can, therefore, be interpreted in two 
ways. The first is as the probability of being in a state contained in X + at some 
point of time remote from the origin. The second is as the time spent in as a 
fraction of the total time (0,7) as T tends to be very large. 

(b) Steady State Frequency of Encountering X + 

It is more often simply called frequency and can be defined in two ways. The 
first, is the mean rate at which X + is being encountered at some point in time 
remote from the origin. The second, is the average number of encounters of X + , 
considered over a very large time interval. - 

(c) Mean Cycle Time 

This is defined as the mean time between two successive encounters of X + . 
It is the reciprocal of frequency. 

(d) Mean Duration of X + 

It is the expected time of residence in X + in one cycle of X + . 
In addition to the measures defined above, two more useful measures can be 
calculated using the concepts outlined in Chapter 2. 

(e) Mean First Passage Time 

This is the time from system initiation to the first encounter of X + where X + 
denotes the system failure condition. 

(f) Mean Passage Time 

When considered in terms of system failure, the mean passage time is called 
the mean time to failure and is the mean time from an instant when the system 
is in X~ (X~ is the disjoint of X + ), chosen randomly, to the encounter of X + . 

The choice of a proper measure depends upon several factors such as the 
system requirements, the feasibility of calculations, measurability, and so on. 
The various measures respond in different ways to the system parameters and no 
single measure can give a complete picture of system reliability. When dealing 
with repairable systems in the time domain, the time specific availability, 
fractional duration and interval frequency are the most useful parameters. In the 
steady state, availability, frequency or cycle time and the mean duration provide 



4 System Reliability Modelling and Evaluation 

a good measure of system adequacy. The use of multiple indices can sometimes 
create problems regarding decision making. A weighting or majority voting 
procedure can be employed in such cases. Usually one index is more important 
in a given physical environment due to the nature of the system function. 

2 Overall Target and Allocation to Subsystems 

The overall reliability target for a given system is normally determined by 
consultation between management, planning and design functions within the 
organization. The selected target is based on the state of the art and the need 
and desire for further improvements. Target selection consists of setting a 
specific probability, frequency or some other reliability indices as the goals for 
the project reliability program. In general, the targets should be optimistic but 
should also be physically achievable. Reliability is like other parameters such as 
speed, weight and cost and as such it is subject to trade-off with these other 
parameters. For example, the reliability of a particular pumping system may be 
greatly improved by installing spare pumps, but the cost may be prohibitive. 
Once an overall target for the system has been defined, the next step is to 
allocate the targets for the different subsystems which make up the total system. 
This can involve a considerable amount of effort and in this regard, the following 
techniques are helpful. 

Similar Familiar System Technique 

This approach is based on experience and objective judgement and is very useful 
in the conceptual design stage prior to total definition of the system. The 
allocation is based on the state of the art of similar systems performing the 
same basic function. When using this technique, the following factors must be 
carefully considered when making the projection: 

(i) system physical and performance comparison 

(ii) design similarity 

(iii) manufacturing similarity 

All assumptions and conditions required to meet the predicted target figure 
and their implications to the program must be examined and defined. 

Factors of Influence Method 

This technique is used when there is an overall reliability goal and the system or 
equipment design is new, or when the system is a modification of an existing 
system for which operating experience data are not available: 

(i) for each system considered, assigning weights to the following factors: 



Introduction 5 



(a) complexity/time of operation 

(b) environmental conditions 

(c) state of the art 

(d) criticality 

(ii) for each system, adding the above weights to obtain the system weight 

(iii) obtaining the relative system weight for each system by normalization 

(iv) apportioning the overall targets according to the relative system weights. 

3 Reliability Modelling and Evaluation 

This is an important element in any reliability program because the selected 
model provides the basis for predicting the reliability measures. Various 
techniques of reliability modelling and evaluation are discussed in detail in this 
book. These techniques are either direct analytical modelling or simulation or 
a mixture of the two approaches. In the direct analytical modelling method a 
model is built which reasonably idealizes the physical system and is also 
amenable to calculation. The reliability measures are then obtained by 
manipulating the model. This approach is superior to simulation and should be 
used wherever possible. Simulation also employs a mathematical model but 
proceeds by performing sampling experiments on this model. It is more flexible 
but is also more time-consuming and less accurate. Simulation can be used to 
provide estimates of the same basic measures which would be obtained by a 
direct analytical approach. 

In the initial stages of design, when only a general system outline is available, 
the analyst may have to be satisfied with rudimentary failure modes, effects and 
criticality analysis (FMECA). This technique can be used to systematically study 
the modes of failure and their effects on the system. The various failure modes can 
be arranged in order of their criticality to the system requirements and provide 
some very useful design modification data. The reliability model is in general 
modified and improved as the design progresses and becomes solidified. 

There is one pitfall which every reliability engineer must guard against. 
Repeating experiments with a mathematical model on a computer can generate 
confidence in the reliability measures so obtained. A closer look at the input 
data and at the model used may prove that this sense of confidence is not really 
justified. The assumptions built into any model and the validity of the data used 
must be carefully considered when interpreting the results provided by the 
model. 

4 Testing and Data Collection 

In many research and development projects, tests will be made at component, 
assembly and system level. A systematic procedure should be adopted for 
collecting data from these trials in a cumulative manner. In many projects it 



6 System Reliability Modelling and Evaluation 

may not be possible to conduct tests, and data from other sources having 
similar units may be used. Data collection activities are a vital part of reliability 
evaluation since without valid data, excessive sophistication in model building 
may be simply an intellectual exercise. 

5 Evaluation of Alternative Designs 

Theoretically, a number of designs should be prepared and the one having 
maximum reliability and satisfying the other constraints should be selected. In 
some cases, the number of alternatives may be relatively small due to physical 
constraints. Reliability analysis provides an additional degree of consistency to 
the evaluation of alternate proposals. In many cases, the decision is based upon 
lowest total cost and therefore the function of the reliability analyst is to ensure 
that the system or design selected does satisfy the required targets. Reliability 
evaluation should provide useful input at the design decision points. If the 
analysis is not done as the design progresses, it does not provide this initial 
function and therefore becomes a bookkeeping exercise after the fact. 

6 Reliability Report 

The final step in any reliability study should be the preparation of a detailed 
report containing information on the reliability program, the trials made and 
the results obtained. The report must contain the assumptions made in 
developing the reliability model and also indicate the level of confidence in the 
data used. The report should be an objective assessment and should enable the 
top management to obtain a proper appraisal of the expected system reliability. 

Scope of This Book 

The general application of reliability engineering concepts is an extremely wide 
field as illustrated by the six broad topics noted earlier. This book makes no 
attempt to deal with all these topics and is devoted entirely to the concepts, 
philosophy and techniques for reliability modelling and evaluation. The systems 
analyst and the reliability expert must be capable of developing valid models for 
reliability evaluation of a system. The necessary background in probability and 
stochastic processes is therefore essential and is reviewed in some detail in 
Chapter 2. The book emphasizes the calculation of more than one reliability 
measure as a single index may not provide a complete picture of the system 
reliability. The book covers both constant transition rate and non-Markovian 
systems. A special chapter is devoted to the problems encountered in reliability 
evaluation of large systems and several solutions are proposed. Reliability 
analysis is an integral part of economic system design. It is extremely important 
that reliability be considered in quantitative rather than qualitative terms and 
therefore provide a consistent and responsive indicator of system adequacy. 



CHAPTER 2 

The Preliminaries 



Introduction 

Ari appreciation of certain basic probability concepts is essential for the 
development of reliability models and their subsequent evaluation. In general, 
probability mathematics provide the medium for examination of systems 
which exhibit random phenomena, i.e. behave in accordance with probablistic 
rather than deterministic laws. It has been assumed that the reader is familiar 
with the basic probability concepts normally encountered in an undergraduate 
engineering course. Many text books and, therefore, courses introduce 
probability theory either as an abstract mathematical concept or through the 
use of apriori situations such as dice or playing-card type examples. In these 
cases, it is often difficult to develop an easy and effective interface between 
the basic reliability and probability concepts. This chapter reviews the basic 
probability theory required in subsequent chapters with emphasis on utilization 
in system reliability modelling and evaluation. 

Sample Space 

The set of all possible outcomes of a random phenomenon is called the sample 
space, sample description space or possibility space. As an example consider two 
transmission links each existing either in the up state (U) or the down state (£>). 
The description of the possible states at any time is given by the set 

S = {(l£/,2f/),(1^2i)),(lA2C/) 5 (lA2i))} 

A set which has a definite number of elements is called a finite set. A set which is 
not finite but whose elements can be put in a one to one correspondence with 
the set of natural numbers is said to be countably infinite or denumerable. Both 
of these types, the finite and the denumerable set come under the general name 
of countable set. 

As another example, consider the load on a pumping system. This may assume 
any value between L Q , the minimum load and L l the maximum peak load. The 
sample space S, therefore, consists of all points s such that 



L 0 <s<L l 



8 System Reliability Modelling and Evaluation 

The interval (L^L^) contains a noncountable infinity of members. Such a 
sample space which is not countable is called uncountably infinite or simply 
uncountable. 



Events 

Consider the descriptions (1 U, 2D) and (ID, 2*7) in the example of two 
transmission links. These descriptions define the event that one of the 
transmission links has failed. Similarly if both the transmission links are needed 
to keep the system in operation, the set"j (ID, 2D), (1*7, 2D), (ID, 2*7) }• defines 
the event that the system has failed. An event can therefore be defined as a set 
of descriptions and the event E is said to have' occurred if the outcome of the 
random phenomenon is a member of E. As the sample space contains the 
descriptions of all possible outcomes of the random phenomenon under 
consideration, an event may also be defined as any subset of the sample 
description space. In the case of demand on the pumping system, the subset 
defines the event that load is greater than or equal to L . 
Whenever the load is in the interval (L^ L^, the event load is greater thanl 2 ' 
is said to have occurred. The events are sets and therefore the algebra of events 
is essentially the concepts of set theory. 



Random Variables 

A random variable is a quantity which assumes values in accordance with certain 
probabilistic laws. A random variable which assumes discrete values is called a 
discrete random variable and one which assumes values from a continuous interval 
is termed a continuous random variable. This definition of a random variable is 
sufficient for the purpose of this book. A more precise definition of a random 
variable is as a function defined on a sample space in which certain technical 
conditions are satisfied. The random variable relabels the descriptions of 
outcomes contained in S in terms of real numbers. The domain of the random 
variable is the sample space S and the range is contained in the set Re of all real 
numbers. 

Consider again the example of the two transmission links. Let X be the 
function defined on the sample space S, where X denotes the number of 
components down. The values of this discrete random variable are given below: 

X(s) = 0 , s = (1*7, 2*7) 

= 1,5 — (1*7, 2D), (ID, 2*7) 
= 2 , s = (ID, 2D) 



The Preliminaries 9 



In the above example, instead of describing the outcomes as shown above, 
they could be assigned integer values by a random variable Z such that 

Z = 0 , y - (1*7,2(7) 

= 1 , s = {IU. 2D) 

-2,5 = (ID, 2*7) 

= 3,5 = (ID, 2D) 

The random variable Z depicts the state of the system and its different values 
are termed the state space. If a device is put into operation at time t = 0, the 
time to failure can be denoted by the continuous random variable X, Some 
other examples of a continuous random variable X assuming a value x are 
temperature (-273° £<x<°°), the load on a power system (L Q <x<L x ), the 
repair time of a component (0<x <°°) and the noise voltage at an amplifier 
output point. 

Probability Laws 

In application, the functional form of the random variable is not usually of great 
interest. The main focus is usually on the probability with which the random 
variable assumes a certain value. Probability is a function which assigns a number 
between 0 and 1 to a set of points (event) in S. The statement that the random 
variable X assumes a value in set B of real numbers, implies that the event defined 
by the subset, (5: X(s) is in B) occurs. Therefore 

P[{s:X(s)isin£}] = P[X is in B] 



This is the basic formula for obtaining the probability function of the random 
variable X from the probability function which exists on the sample space S on 
which the random variable X is defined as a function. Some other types of events 
in terms of a random variable can be defined for fixed numbers x, a, b 



[X = x] 


= {s: 


X(s) = x} 


[X<x] 


.= ■{*: 


X(s)<x} 


[X>x] 


= {s: 


X(s)>x} 


[a<X<b] 


= {,: 


a<X(s)<b} 



A discrete random variable assumes only discrete values Xj, i= 0, 1 , 2, . . . from 



1 0 System Reliability Modelling and Evaluation 



the set Re of real numbers. The probability density function for a discrete random 
variable is defined by 

Px(x) = P[X = x] (2.1) 
This function should clearly have the following properties 
(\)p x {x) = 0 unless x is one of x 0 ,x l ,x 2 , . 
(H)0<p x (*)<l 

mZPxQc) = inx=x i ] = i 

i i 

The probability density function for a discrete random variable is sometimes 
also called the probability mass function of X. The probability function P%(B) 
of the random variable in terms of probability mass function is given by 

P X (B) = P[Zisin5] = £ Px (x t ) 

Xj<EB 

The probability distribution function F x (x) of the discrete random variable is 
given by 

F x (x) = P[X<x] 

= Z .Px(xd 

It should be noted that the domain of the distribution function is the set of all 
real numbers and the range, being a probability is the interval (0,1). This is in 
contrast to the case of a random variable whose domain is the sample space. 

A continuous random variable can assume any value over a continuous 
interval. Since the number of elementary events in a continuous interval is 
infinite, the probability of the random variable X assuming a value exactly 
equal to x is zero. This is, however, not an impossible event. For example, 
although the time to actual failure of a component may be x, the probability 
of this event happening is zero. The probability density function of the form in 
Equation (2.1) is therefore not suitable for a continuous random variable. The 
probability density function f%(x) of a continuous random variable X is so 
defined that 



P[a<X<b] =f f x (y)dy 



(2.2) 



The Preliminaries 1 1 
The probability distribution function Fy(x) can be now written as 
F x {x) = P[-<*<X<x] 

= J* fx(y)dy " (2-3) 

It follows from Equation (2.3) that 

fx(x) = F'x<x) ( 2 - 4 ) 

The function (x) has the following properties 

1 -fx( x ) i s non-negative 
/* +» 

2. I f(x)dx = 1 

3. The function f% (x) is continuous at all but a finite number of points, i.e. it 
is piece-wise continuous. 

When we are dealing with the distributions of operating and down times, the 
random variable X is non-negative and its density function is zero over the 
negative range. It is sometimes more convenient to work with the complementary 
function of F(x) called the survivor function 

3f(x) = P[X>x] 

= /J f(y)dy (2.5) 

- l-F(x ) 

It follows from this expression that 

/(*) = - %'(x) (2.6) 

In reliability modelling and evaluation one function which is used extensively 
and which is equivalent to f(x) is the hazard function. In practice, depending 
upon the circumstances employed, it may be known by a variety of names, age 
specific failure rate or simply failure rate, repair rate, hazard rate, force of 
mortality etc. A detailed interpretation of this function is given in the next 



1 2 'System Reliability Modelling and Evaluation 

chapter while developing the concept of frequency. This can be defined as 
P[x<X<x + Ax\x<X] 



<p(x) - lim 



Ax 



(2.7) 



With the interval length approaching zero, Equation (2.2) can be written as 
P\x<X<x + Ax] 



Ax-*0 



Ax 



Equation (2.7) can be rewritten in the form 



0(x) - lim 



Ajc 



P[x<X) 



(2.8) 



It is usual to drop the suffix X denoting the random variable unless there is a 
likelihood of incorrect interpretation and this has been done in this text. Using 
Equations (2 .6) and (2.8) 



0(*) 



#(*) 

d 



dx 



Pog3f(x)] 



(2.9) 



Integrating Equation (2.9) and using the condition that 3r(0) ~ 1 
%{x) ■== exp 



and therefore 

f(x) = (j>(x) exp 



\%{y)dy 

~T <p(y)dy 
J o 



This expression shows that <$>(x), the hazard function uniquely determines 
the probability density function. The probability density function, the 
probability distribution function, the survivor function and the hazard rate 
function are mathematically equivalent. 



The Preliminaries 1 3 



Expectation 

The probabilistic behaviour of a random variable is completely defined by the 
probability density function or distribution function. It is, however, often of 
interest to obtain a single value which may represent the random variable and 
its probability distribution. One such characteristic value is the expectation or 
mean. This is denoted by E{x) and given by 

E(X) = I Xi P\X = Xi ] 

i 

= X x iP( x i) if is a discrete random variable 

i 

or 

= xf(x)dx if X is a continuous random variable (2.10) 



The expectation is said to exist if the series or the integral involved converges 
absolutely, i.e. 

Yj \x t \ p(Xi) < 00 for the discrete case 
and 1 

\x\f(x)dx<°° for the continuous case 

The expectation or the mean value has a meaningful interpretation in terms 
of the average of a sample. This interpretation is provided by the law of large 
numbers. Assume that there are n random variables X x , X 2 , . . . , X n which are 
identically distributed as X and each has a mean m . This set of n random 
variables represents a random sample of n observations X^X^ . . . y X n . The 
sample mean, which is also a random variable is given by 

- _ x 1 +x 2 + ... + x„ 

n 

According to the law of large numbers for any constant c > 0 

lim P[\X-m\>c] = 0 (2.11) 

This implies that as the sample size increases, the sample mean approaches 
the mean of the random variable. Thus if the random variable X is observed 
many times and each time the arithmetic mean is calculated, it will approach 



1 4 System Reliability Modelling and Evaluation 



the mean or the expectation of the random variable X as the number of 
observations becomes very large. 

An important result concerning the sum of the random variables is given by 




That is the expectation of the sum of a group of random variables is equal 
to the sum of the expectations of the random variables. This result holds even if 
the random variables are not independent. 



Variance 

The arithmetic mean indicates the central tendency and provides a value around 
which the random variable X is distributed. Two or more random variables may 
have the same mean value but the deviations from this value may have different 
likelihoods. The information regarding the scatter of the values around the mean 
is provided by the variance of X, designated V(X) 

V(X) = E[(X-E(X)) 2 } (2.12) 

= J] (*i — m) 2 p(xi) for the discrete case 

i 

and 

= f (x — m) 2 f{x) dx for the continuous case 



The variance is a weighted average of the values (X-m) 2 and therefore if large 
it means that the distribution of X is such that large deviations occur with a 
comparatively high probability. On the other hand if the values are near m with 
large probability, the variance is comparatively small. The variance, therefore, 
provides us with a measure of the spread of the distribution. Equation (2.12) 
can be put into another form more suitable for computation 

V(X) - E[(X-E(X)f] 

= E(X 2 )-{E{X)f (2.13) 

The variance is often denoted by a 2 and has the dimensions of X 2 . The square 
root of the variance o has the dimensions of X and is called the standard 
deviation. 



The Preliminaries 15 



Covariance 

Consider the two random variables X and Y. The quantities (X - E(X)) and 
(Y - E(YJ) are the deviations of the two random variables from their respective 
means. The expected value of the product of these deviations is called the 
covariance and is given by 

Cov(X, Y) ~ E[(X-E(X))(Y-E(Y))) 

= E(XY)-E(X)E(Y) (2.14) 

In terms of the joint probability density function / (x,y) of X and Y, it is 
given as 

Cov(X, Y) = (x-m x )(y-m y )f(x,y)dxdy (2.15) 

where f(x,y) is so defined that 

P(x<X<x+dx,y<Y<y + dy) = f(x,y)dxdy 

The covariance gives a measure of the tendency of the two random variables 
to vary together. If both the deviations have the same sign, then the sign of the 
product is positive and otherwise it is negative. Therefore, if the two variables 
tend to vary in harmony, the sign of the product will be positive with a larger 
probability and the covariance will, therefore, be positive. On the other hand if 
the two variables tend to vary in opposition, the sign of the covariance is 
negative. When the two random variables are independent, their covariance is 
zero. This tendency to vary together or in opposition is often measured by the 
dimensionless quantity called the correlation coefficient 



Cov (x, y) 



y/V(X) V(Y) 

which can be shown to lie in the range [-1, 1] . The correlation coefficient is the 
covariance normalized by the product of the standard deviations and its sign 
is the same as that for covariance. The statements made about the sign of the 
covariance, therefore, apply to the correlation coefficient too. It should be 
noted that correlation and non-correlation are similar to independence and 
interdependence but the two are not identical. If the two random variables 
are independent then C^y ~ 0 t> ut the reverse is not always true, i.e. 
independence does not necessarily follow from non-correlation. Correlation 



1 6 System Reliability Modelling and Evaluation 

implies dependence but the reverse is not necessarily true i.e. C^y ^ 0 does not 
follow from dependence. 



Moments 

The expectation of a real valued function g(X) of X is given by 

(i) E[g(X)] ■= £ -g(x t )p(Xi) if Xis discrete 

i 

(ii) E\g(X)] = ^ 0a g(x)f(x)dx ifX is continuous. (2.17) 

One simple function whose expectation is of interest is the kth power of 
X, i.e. g(x) - X^. The expectation of this function is called the kth initial 
moment of X or the distribution of X. Therefore the kth. initial moment of 
X is given by 

m k (X) = m k 

— H x fp( x i) for the discrete case 



f(x)dx for the continuous case. (2.18) 



A somewhat more useful concept, the kth central moment, is similarly 
defined by 



M k (X) - M k 

= Z ( x i ~ m ) k p( x i) if X is discrete 

i 

= (x-m ff(x)dx if X is continuous. (2.19) 



It can be seen that the mean is the first initial moment, the mean square 
value is the second initial moment and the variance is the second central 
moment. 

In a similar manner, the mixed initial moment for the multi-dimensional 
distribution can be defined as 



The Preliminaries 1 7 



m ki>K _ kn (X l ,X 2 , = E(X$ X% . . . Xfr} (2.20) 

and the mixed central moment is given by 

M k u k 2 k n (X 1 ,X 2 ,...,X n ) 

= E[(X 1 ^E{X l ))^(X 2 -E(X i )^ (2.21) 
The covariance is therefore the first central mixed moment of X and Y. 

Coefficients of Skewness and Excess 

It can be seen from Equation (2.19) that odd moments vanish for a distribution 
which is symmetrical about the mean value. The third moment can, therefore, be 
used as a measure of the asymmetry of the distribution. Asymmetry is more 
conveniently measured as a dimensionless quantity and the asymmetry 
coefficient or skew coefficient is given by 

A - M 3 /(M 2 ) 3/2 (2.22) 

The coefficient of excess also gives information about the form of 
distribution by comparing it with the normal probability density function near 
its mode and is defined by 

G = Mj(M 2 f - 3 (2.23) 

M 4 = 3A^ for the normal distribution and therefore the coefficient of excess 
is zero. In the case of a distribution having the same variance as the normal, 
G > 0 indicates that the distribution has a sharper peak than the normal 
distribution and similarly G < 0 indicates a comparatively flatter peak. 



Transform Methods 

Operational or transform methods are employed for transforming the problem 
into a functional form which at first glance appears to have nothing to do with 
the original problem but which often facilitates its solution. They are used in 
many branches of mathematics especially in solving differential equations. One 
technique of importance in probability theory is the method of characteristic 
functions. The characteristic function of a random variable X is defined by 

= f°° exp(idx)f(x)dx (2.24) 



1 8 System Reliability Modelling and Evaluation 
where 

and f(x) — the probability density function of X 

The probability density function can be found from the characteristic 
function by the inversion formula 

f{x) = j + ~ 0(0) exp (-i6x)d9 (2.25) 

There is a one to one correspondence between the characteristic function 
and the probability density function. Therefore, two characteristic functions 
which are equal at all points are the characteristic functions of the same 
distribution function and vice versa. 

Differentiating (2 .24) k times 

This is with the assumption that the kth derivative exists. As 0 -* 0 
0 (fe) (O) = i h j + °°x k f(x)dx 
= i k m k 

Therefore the kth moment of the random variable X, if it exists, can be 
obtained from the kth derivative of the characteristic function evaluated at 
zero 

m k = rV*>(0) (2.26) 

An important result in probability theory is that the characteristic function 
of the sum of several independent random variables is the product of the 
characteristic functions of the random variables. Let 

Y = X x + X 2 + . . . + X n 

where X { are the independent random variables. Then 

Me) - E(e idY ) = E(e mx i+ X > + - +X »>). 



The Preliminaries 19 

Since the random variables are independent 
0 y (0) = E(e wx >) E(e iex *) . . . E(z wx n) 

= fe^te.C?) •■••^) (2.27) 

Another transform often used is the moment generating function, defined 
for all real numbers by 

= E(e ex ) 

"J edX f( x )d x for the continuous case. 

= £ Q dXi p(xi) for the discrete case. (2.28) 

i 

As in the case of the characteristic function, the following results are 
important 

m k = V k) (0) (2-29) 

and 

MB) = rlt Xi (B) tx 2 ( 9 ) • • • ^x n (d) (2-30) 

when 

Y = X, + X 2 + . . . + X n 

Xj being independent random variables. 

The Expression (2.29) is suitable for calculating the kth initial moment and 
the central moments can be calculated from the initial moments. The moment 
generating function about the mean m or in fact any other point can also be 
defined as 

* ra (0). = £te< x - m > fl ) 

= e-^iKfl) (2-31) 

The £th moment about the mean can be calculated from its ftth derivative 
evaluated at zero, i.e. 

M k = *<»(0) (2.32) 



The main advantage of the characteristic function over the moment generating 
function is that it always exists whereas the moment generating function may not 
exist. 



20 System Reliability Modelling and Evaluation 



For an integer valued discrete random variable X having a probability mass 
function pp the moment generating function can be simplified by substituting 
z - e® 

*(Z) = Zz'pi (2.33) 



This function is called the probability generating function of X or the Z 
transform of X. It can be shown that 

Pk = k] (2-34) 
m k = ^ k) (l) (2.35) 



Reliability modelling is usually concerned with non-negative random variables 
and for these the Laplace transform is very useful. In general if g(t) is a real 
function of a variable t defined for t > 0, then the Laplace transform of this 
function is given by 

L[g(t)] = g(s) = J"*" g(t)e~ st dt (2.36) 

where s is a complex variable. The inverse Laplace transform is defined by 

[£(*)] = g(t) = ^ I"*'" g(s)e st ds 

Z1T J c -ioc 

when i — >/— I 



In practice it is seldom necessary to perform this contour integration and the 
inverse is calculated by expanding #(s) into partial fractions and using the tables of 
Laplace transforms. The following results are of importance with respect to this 
book, 

1 . The Laplace transform of the derivative of a function g(t) whose Laplace 
transform is g\s) is given by 



dg(t) 
dt 



sg(s)-g(0 + ) 



(2.37) 



Here g(0 + ) is the limit of g(t) as t approaches zero from positive values. Very 
often this will be written simply asg(0). 



The Preliminaries 2 1 

2. The Laplace transform of an integral is given by 



f g(u)du 
Jo 



s 



(2.38) 



3. The Initial Value Theorem states 



g(0 + ) = lim g(t) = lim sg(s) 

f->0 s->™ 



(2.39) 



4. The Final Theorem states that 



g(oo) = \im g(t) = lim sg(s) 



(2.40) 



if the limit g(t) exists 



While considering the Laplace transform of a probability density function, it 
should be observed that except for the sign of s it is the same as the moment 
generating function. Therefore 

L[f(x)} = f(s) - E(e~ sX ) 

As/Ly) is a probability density function it follows from (2.36) that 

/(0) - 1 

Using the result (2.38), the Laplace transform of the probability distribution 
function is given by 



F(s) - 



(2.41) 



and the Laplace transform of the survivor function is given by 



(2.42) 



The f{s) can be written as 



f(s) = E 



fe=o K - 



22 System Reliability Modelling and Evaluation 



fe=o K - 

(s) k 

The kth initial moment of X is given by the coefficient of in the Taylor 

expansion of f(s). Similarly for the /cth central moment M^, ^ m /(s) should be 
expanded. 



Some Special Distributions 

This book is mainly concerned with distributions of continuous random 
variables such as the time to failure and time to repair. Any function which is 
non-negative and whose integral over its range equals one, can be a probability 
density function. There are, however, some special mathematical forms which 
are often used in reliability modelling as probability density functions. The 
following section examines these special distributions. 



1 Exponential Distribution 

A non-negative continuous random variable is said to have a negative exponential 
distribution if it has a probability density function defined as 

f(x) = p e-<>* (2.44) 

where p is some positive constant. The corresponding distribution function is 
given by 

F(x) = ( X f(u)du = 1 -e" p * (2.45) 
J o 

The survivor function is 

R(x) = J*~ f(u)du = e~ px (2.46) 



The hazard function is 

= P (2-47) 



The Preliminaries 23 



The Laplace transform of this distribution is given by 

The mean = (- l)/ (1) (s) | s=0 = P(P + *)" 2 L=o = ~ (2 48) 



It can be seen from Expressions (2.47) and 2.48) that the hazard rate for the 
exponential distribution is constant and equals the reciprocal of the mean of 
the random variable having this distribution. Since the hazard rate uniquely 
determines the probability density function, it follows that if the hazard is 
constant then 



f(x) = pexp 
= p e"*" 



f Pdu 
J o 



That is the distribution is exponential. It is worth repeating that only the 
exponential distribution has the property that the hazard rate is constant and is 
equal to the reciprocal of the mean. The second initial moment is 

m 2 = (-1) 2 / (2) (*)U = j 
and therefore the variance 




(2.49) 



The standard deviation island the coefficient of variation equals one. The 
graphs of the probability density, probability distribution and the hazard 
function are shown in Fig. 2.1 . The exponential law corresponds to the 
maximum randomness of the lifetimes of the components. The life of a 
component observing the exponential law has the interesting property that the 
previous operating time of the component does not effect the residual or 
remaining lifetime distribution. Consider that a component whose lifetime is 
represented by an exponentially distributed random variable X is operated up to 



24 System Reliability Modelling and Evaluation 

time t then the distribution function of Y = X -t is given by 

F Y {x) = P[X-t <x\X>t] 

= P[X<x + t\X>t] 

P[t<X<x + t] 
P[X>t] 

_ f x+t pe' pu du 

= l-e' px 
= f x (x) 



Probability density Survivor function Hazard rate 

function 




Fig. 2.1 Characteristics of the exponential distribution 

It can, therefore, be seen that the distribution of the residual life time of the 
component is independent of the time for which the component has been 
operating. It is as if the component forgets how long it has been operating and 
the breakdown occurs not because of gradual deterioration but a randomly 
occurring failure. The negative exponential is the only probability distribution 
having this memory loss property. Its proof is left to the reader in Exercise 1 . 

The exponential distribution bears an important relationship with the Poisson 
distribution that if the number of events per unit time is given by a Poisson 
distribution then the distribution of the interevent time is exponential. Suppose 
that a component on failure is replaced by an identical component and the 
number of failures per unit time is X, then if failure occurs according to the 
Poisson law, the distribution of failures in time t is given by 

p k (t) = P[Number of failures = k] = — 

k\ 



The Preliminaries 25 



That no failure occurs during the interval (0, f) is equivalent to having the 
inter failure time greater than t, i.e. 

\~F(t) = p 0 (t) = e-"* 

Differentiating both sides with respect to t 
f(t) = \e~ Xt 

It can therefore be seen that for Poisson failures, the distribution of the time 
between failures is exponential. 

The exponential distribution has desirable mathematical properties and is 
used widely for representing operating and repair times. The data collected in 
connection with complex equipment shows that their time of operation is 
indeed well described by the exponential law. It can be shown mathematically 
that if a complex piece of equipment has a large number of stochastically 
independent components such that each component is replaced immediately on 
failure and every component causes equipment failure, than after a long time the 
distribution of operating time of the equipment is well approximated by the 
exponential law. Though the exponential law generally holds for the operating 
time during the useful life of the equipment, no single hypothesis seems to exist 
for the repair time. It is generally realized that an exponential representation for 
the repair time may not be valid in many cases. When the operating time is 
exponentially distributed and its mean value is much larger than the mean down 
time, then the steady state values are not generally significantly affected by the 
nature of the repair time distribution. The question of non-exponential 
distributions for down times is examined in detail in Chapter 6. 

2 Normal Distribution 

A continuous random variable is said to be normally distributed if its probability 
density function is of the form 

f{x) = —^=e-< x - m ? l2a2 dx, ,-oo< x<00 (2.50) 



where a is positive and m is any constant. The graph of (2.50) is bell-shaped and 
is symmetrical about x=m as shown in Fig. 2.2. In the special case when 
m = 0 and a=l, this function is called the standard normal density function. 



26 System Reliability Modelling and Evaluation 

The probability distribution for the normally distributed random variable is 
given by 

F(x) = — U= [* e-<"- m)2/2a2 du (2.60) 

This function cannot be expressed in closed form in terms of familiar 
elementary functions. The function F(x) can be transformed into the standard 
normal distribution by a simple change of variable 

u—m 

z = 

o 

Expression (2.60) becomes 

x-m 



The values of the standard normal probability distribution function are 
obtained by numerical integration and are tabulated in every elementary 
statistics book. The use of these tables is left as an exercise to the reader. The 
moment generating function of a normally distributed random variable is 



HO) = E(e dx ) 



1 r°° 

Oy/Tn J _oo 



Making the substitution 



x —m 

z — — so that x = oz 4- m and dx = odz 



4,(6) = e me 4=[ + " e- izX - 2a6z)l2 dz 
\J1ti J -oo 

V27T J-oo 



The Preliminaries 27 



Making the substitution w - z — ad , dz = dw 

V 27T J_oo 

E(X) = mean - V l \d)\ e=0 = (m + o 2 0) e^ 2 ^ V 0 
= m 

E(X 2 ) = V 2 \6)\ e=0 = [(m + <j 2 ef + o 2 ]e m9+ ° 2d2l2 \ 6=0 
= m 2 + o 2 

Therefore 

Var(X) - E(X 2 )~{E(x)f = a 2 

The parameters m and a, therefore, represent the mean and standard 
deviation of the normal distribution. The domain of the normal distribution, 
extends from - 00 to + °°. The operating times of the components are limited, 
however, to the positive values. If a normal distribution is to be used for 
modelling the operating or down times of the components, then its truncated 
version can be used. The truncated normal distribution is given by 

f(x) = — i~ e - ( *- m)2/2a \ jc>0 (2.62) 
aasftn 

where a is the normalizing factor and is 
1 



= f" e"" 2/2 du 



a 

The survivor function is given by 
R(x) =|J f(u)du 

and the hazard function 

^ ft*) 



The hazard rate of the normal distribution is a monotonically increasing 
function and the graphs of the probability density function, probability 



28 System Reliability Modelling and Evaluation 
distribution function and the hazard rate are shown in Fia. 2.2 



Probability density Survivor function Hazard rate 

function i 




Fig. 2.2 Characteristics of the normal distribution 



3 Log-normal Distribution 

A non-negative random variable X is said to have a log-normal distribution with 
parameters m and o if Y = log X is normally distributed with parameters m and a. 
Since Y is a non-decreasing function of X 

P(X<x) = P(Y<y) 

or 

Differentiating both sides 
Therefore if 

ay/in 

Then 

f(x) = _J_ c - dog *- m)W x>0 (2 .63) 

Jy J xo\fTn 

which is the probability density function of X having a log-normal distribution. 
The corresponding cumulative distribution function is 



F(X) = fV<lol"-m)W du 



The Preliminaries 29 



Making the substitution 
log u m 



log x-m 



which is the standard normal probability integral. The hazard rate is given by 
/(*) 



<Kx) = 



l -Fix) 



The hazard rate of a log-normal distribution can be shown to increase to a 
maximum value and then decrease to zero as* -> 0. The log-normal distribution 
does not seem to be physically suited to model component lifetimes. It does 
however, seem to provide a reasonable fit for many component repair times. 

The expression for the kth initial moment is easily derived as 



m k iX) = EiX") = Ei* hY ) = e mfc+oW ' 2 



The mean is 



m x 



and the variance 



(2.64) 



(2.65) 



(2.66) 



The graphs of the probability density function, probability distribution function 
and hazard rate are given in Fig. 2.3 



Probability density 
function 



Survivor function 




Fig. 2.3 Characteristics of the log-normal distribution 



30 System Reliability Modelling and Evaluation 
4 Weibull Distribution 

The Weibull distribution is defined by the following probability density function 
f(x) = apipxf-'e-^ 01 , x>Q (2.67) 
The survivor function 

R(x) = jj ctpipuf-'e-^du 

= e- { <> x)a (2.68) 
and therefore the hazard function 

iKx) = <xp(pxT- 1 (2.69) 

The hazard rate increases with x for a > 1 and decreases for a < 1 . The graphs 
of the probability density, probability distribution and the hazard rate of a 
Weibull distribution are given in Fig. 2.4. The kth moment of a Weibull 
distribution can be calculated by 

E(X k ) = r x k ap{pxf- l Q- i(>x)QL dx 
J o 

Substituting z = (px) a 

m k = ~ r z k/a e z dz 
P J o 

-KM 

where T(a) is a gamma function defined by 

T(a) - I"" z a ~ l e' z dz 
J o 

When a is a positive integer 

r<a) = (a-l)] 

The mean value 

T(l + I la) 

m = 

P 



(2.70) 



(2.71) 



The second initial moment 



The Preliminaries 31 



HI + 2/a) 



and therefore variance 

T(l + 2/a)-(r(l + l/<*)) 2 



(2.72) 



Probability density Survivor function Hazard rate 

function 




0 f 0 t 0 

Fig. 2.4 Characteristics of the Weibull distribution 



The Weibull distribution is often used in connection with mechanical 
components and has been used to model fatigue failure and the failure of ball 
bearings. It has also been recently used to describe the repair time of electric 
power generating units. 



5 Gamma distribution 

A non-negative continuous random variable X is said to be gamma distributed if 
its probability density function is given by 



P(px) c 



r(a) 



(2.73) 



This density is a function of parameters p and ot both of which are positive con- 
stants. When a is an integer equal to a, then this distribution is also called the 
special Erlangian distribution, 



p{p*y 



32 System Reliability Modelling and Evaluation 
The probability distribution function is given by 



Fix) = r 

J o 



du 



Putting pu - z,du = —dz 
P 



Via) J o 



The function J 0 px z a_1 o~ z dz is called the incomplete gamma function. When a 
an integer a (special Erlangian distribution), it can be shown by integrating by 
parts that 

fe=o k\ 

The Laplace transform of a gamma function is given by 

m 



p 

p + s 



Therefore the mean = (— 1) / (1) (s)l s =o 



The second initial moment = (-1) 2 /" (2) (s)l s = 0 



Probability density Survivor function Hazard rate 

function 





0 t o f 0 

Fig. 2.5 Characteristics of the gamma distribution 



The Preliminaries 33 



a 2 a 




and the variance 

- £(* 2 )--(Z<(X)) 2 
_ a 

= ? 

Consequently the standard deviation is \Jajp and the coefficient of variation 
is l/y/a. Some typical cases of the gamma function are shown in Fig. 2.5. The 
hazard rate is non-decreasing for a > 1 and is bounded by p. For 0 < a <1 , the 
hazard rate is non-increasing and approaches zero as x -»■ °°. 

The gamma distribution is very useful due to its simplicity and also because 
it can be used to approximate many empirical distributions. A more detailed 
treatment of its usefullness is given in Chapter 6 while discussing the device of 
stages. A special case is when in Equation (2.73) p = Vi and a where n is an 
integer. This distribution is the x 2 (chi squared) distribution with n degrees of 
freedom. 

Stochastic Processes 

Consider the example of two transmission links again. Z(t) defines the state of 
this system at time t. There is a random variable associated with each value of t. 
The family of random variables (Z(t), t >0) is called a stochastic process. The 
values assumed by the process are called the states of the system and the set of 
all possible states is called the state space. The set of the possible values of the 
indexing parameter is called the parameter space. The indexing parameter in the 
above example is time but other kinds of indexing parameter such as space are 
also possible. For example the number of fibres at a point on a yarn can be 
considered a stochastic process with the length of the yarn as the parameter. 
This book is, however, generally concerned with time dependent processes and 
therefore time is the basic indexing parameter. The stochastic process is, 
therefore, considered as a model of a system which develops in time according 
to probabilistic laws. 

Consider two identical and independent systems of two transmission links. 
The state space for each system is defined as follows: 

Z(t) = 0 both links up 
Z(t) = 1 one link up 
Z(t) = 2 no link up 

Assume that at time zero both the links are up, and the systems are observed 



34 Sys tern Reliability Modelling and Evaluation 



for twelve hours each. The state of each system as a function of time is shown 
in Fig. 2.6. These two observations are called the two independent realizations 
of the stochastic process. Realizations or sample paths could be obtained either 
by observing identical and independent systems or they could be constructed by 
appropriately using a table of random numbers or some equivalent randomizing 
device. The realizations could differ from one another in detail and in fact these 
differences are typical of random phenomenon. The idea of realizations often 
helps to provide a better insight in certain reliability problems. 



z(t) 

2 



■ System 2 



■ System 1 



j i i i i_ 



0 1 2 3 4 5 6 7 8 9 10 11 12 13 
Fig. 2.6 Realizations of a stochastic process 

The stochastic processes may be classified on the basis of the nature of the 
state space and the parameter space. The following are the four possible 
combinations: 

1 Discrete state and parameter spaces 

An example of this nature would be the number of successful missile flights 
in a missile firing scheme. The indexing parameter would be the number of 
missiles fired. 

2 Discrete state space and continuous parameter space 

Most of this book is concerned with stochastic processes of this kind. The 
states of a system of components with time as the indexing parameter is an 
example of this type of stochastic process. 



3 Continuous state space and discrete parameter space 

An example of this is the load on the electric power system observed every 
hour. The solution as such requires more sophisticated tools. These problems 
can, however, be often idealized by Category 1 . 



The Preliminaries 35 



4 Continuous state space and continuous parameter space 

An example of this nature is the storage in a dam observed as a function of 
time. Such problems can be idealized by Categories 1 or 2. 

Probability Distributions 

A stochastic process is defined for a set of points which may be either integer 
coordinates (n - 0, 1 , 2, . . .) or an interval of real time (£?< t or - °° <t < °°). 
At a particular point, the stochastic process is a simple random variable. 
Assuming k arbitrary time points t^t ,t . . .t^<t <t n < . . . , there are k 
random variables Z(^), Z(t m ), Z(t n ). ... In the discrete time case these may 
be denoted by Zp Z m ,Z n , .... The stochastic process is completely determined 
in principle if the joint distribution of Zj, Z m , Z n . . . is known for every k and 
every choice of /, m and n. In practice, however, it is rarely possible to work 
with these joint distributions and most of the information of interest can be 
obtained from transition distribution functions. One property of basic interest 
in the reliability evaluation of a system is the probability distribution of Z n 
for the discrete time case and Z(t) in the continuous time case. 

Consider first the discrete time case. The stochastic process is said to be 
independent if 

P(Z n =x\Z m = y, Z^z,..) = P(Z n =x) 

This means that the probability distribution of Z n is independent of the 
present and the past history of the process. A slight weakening of this 
condition leads to the well known class of stochastic processes called the 
Markov process. In this case 

P(Z n = x\Z m = y, Z, = z,...) = P(Z n = x\Z m =y) 

That is, the probability distribution of Z n depends on the latest of the time 
points and none prior to that. For this reason, the Markov process is sometimes 
called memoryless. The Markov property essentially states that once the state 
occupied at a time point is known, the previous history of the process is not 
involved in determining the subsequent probability distributions. The Chapman— 
Kolmogorov equation gives the conditional probability density function for this 
process 

P(Z n =x\Z l =z) = 

I"" P(Z m =y\Z l =z)P(Z n = x\Z m =y)dy (2.74) 



36 System Reliability Modelling and Evaluation 

Equation (2.74) is for the continuous state space, discrete time case. The 
corresponding equation for continuous state space and continuous time can be 
written as 

P(Z(t n )<x\Z(t l ) = z) = 

|" P(Z(t n )<x\Z(t m )=y)dP(Z(t m )<y\Z(t l )=z) (2.75) 

Equations (2.74) and (2.75) in their general form are rarely used in practice. 
They, however, convey the fundamental idea of recursively building the 
conditional probability density function over the long time interval (/, n) from 
those over the shorter time intervals (/, m) and (m, n). If the conditional 
probability density function depends only on the distance t m - and not on 
t n and tj the stochastic process is called time homogenous. 

The remainder of this chapter discusses the 'discrete state space and discrete 
time' and 'discrete state space and continuous time' Markov processes. Most of 
the reliability modelling falls into the latter case. It is, however, sometimes 
convenient to idealize the continuous time by discrete time processes. The next 
chapter discusses the stochastic processes from the point of view of the 
frequency balance. 

Markov Chains 

This section considers a Markov process with discrete state space and discrete 
parameter space. Equation (2.74) can be simplified when the state space is 
discrete 

P(Z n =x\Z t = z) - £ P(Z n = x\Z m =y)P(Z m =y\Z l =z) 
y 

where x, y, z now denote the discrete states of the system. This equation 
develops the conditional probability density function over the longer 
interval from those of shorter interval. In practice, however, it is usual to work 
with one step transition probabilities. In this case the Markov property states 

P{Z n ^x\Z n ., =y, Z n . 2 = z, ...) = P{Z n =x\Z n , x =y) (2.76) 

If this one step transition probability is independent of n, i.e. 

P(Z n = x\Z n _ 1 = y) = P{Z m =x\Z m . l =-y) 
the process is time homogenous and transition probabilities are termed 



The Preliminaries 37 



stationary. The one step stationary probabilities will be denoted by p^, which is 
the probability of transiting from state / to state / in one step. It is easy to see 
that 

j 

The n step transition probability p\f can be similarly defined as 
p\? = P(Z m+n = i\Z m = i) 
The one step transition probabilities can be arranged in matrix form 
P = (Pa) 

This matrix is called the transition matrix and its yth entry is the probability 
of transiting from state i to state / in one step. Each row sums to unity. A matrix 
which has non-negative entries with each row summing to 1 is called a 
stochastic matrix. Equation (2.75) can be written in terms of the single step 
transition probabilities as follows 

pf = I PikPkj (2-77) 
k 

Arranged in matrix form 

i><2> = p p = p2 ( 2 ? g) 

The matrix of two step transition probabilities can be found by squaring the 
transition matrix. It can be easily seen that 

pin) = pn (2J9) 

In practice, interest is usually focussed on the probability distribution of Z n 
given the initial state of the system. The initial state of the system is defined by 
an initial probability vector 

P m = (Po,PuP2, ■ ■ ■) 

The vector of state probabilities after n steps is found by 



p (ri) = p {0)pn 



(2.80) 



3 8 Sys tern R eliability Modelling and Evalna tion 
Written in the component form, this formula becomes 



(2.81) 



where 



= The probability of being in state / after n steps. 
Pk = The probability of being in state k at the start, 
and $P ~ The probability of being in state / in n steps starting in state k. 



Example: A person is practising firing. If he misses, he becomes nervous and 
the probability of the next shot being a hit reduces to l A> but a hit bolsters his 
confidence and the chance of the next shot being a hit increases to %. If the 
initial shot is a hit, what is the probability of a hit on the fourth shot? Also 
calculate this probability for the initial shot being a miss. 

Designating the hit by 0 and miss by 1, the object is to find the probability 
distribution for Z 4 . The state transition diagram is shown in Fig. 2.7. 



From Equation (2.80) 

p (3> = p (0) / ,3 

For the first shot being a hit 



0) 



-( 



'43 


21' 


64 


64 


21 


11 


.32 


32. 


-) 




64 






Fig. 2.7 State transition diagram 



That is the probability of a hit on the fourth shot, the first shot being a hit, 
is 43/64. If, however, the initial shot is a miss 



The Preliminaries 39 



(0 1) 



"43 


21" 


64 


64 


21 


11 


_32 


32. 


-) 




32 





The probability of a hit in this case is slightly less than the previous case. 

It should be noted here that the (iy)th entry of represents the probability 
of being in the ;th state after n steps, given the system started in state i. The 
states of a discrete Markov chain can be classified into the following types. 

If the states i and / can be reached from each other in a finite number of 
steps, they are said to communicate. The set of states in which each pair of 
states communicate and which once entered cannot be left is called a closed 
communicating class. This is also called an ergodic set of states. On the other 
hand, a set of states in which every state can be reached from every other state 
is called a transient set. The discrete chain in which every state can be reached 
from every other state is termed irreducible or ergodic. In other words the 
states form a single closed communicating set. An ergodic chain in which each 
state can be entered only at certain periodic intervals is called cyclic or periodic 
chain. If a state exhibits this characteristic, then the state is termed periodic 
or cyclic. A discrete Markov chain which is ergodic and a-periodic is called a 
regular chain. The periodic chains and states are troublesome to deal with, but 
fortunately reliability problems are most frequently described by regular chains. 



Equilibrium Distribution 

In the firing practice example 



0-6719 
0-6563 



0-3281 
0-3438 



0-6667 
0-6665 



0-3333 
0-3335 



0-6666 
0-6666 



0-3334 
0-3334 



40 Sys tern Reliability Modelling and Evaluation 



It can be seen that the entries of f 1 seem to be approaching a limiting value. Is 
this true in all cases? The following results are stated without proof. 



1. In any Markov chain which is not cyclic the limit x s = lim p) 
exists. 

2. In any a-periodic, irreducible Markov chain the above limit does not depend 
on the initial probability distribution so that 



lim p} 



3. In a finite regular Markov chain, each row approaches a stationary probability 
vector a = (a 0 ,«i >■••)• ™ s is called tne unique stationary probability vector 
of the process and 



aP = a 



(2.82) 



This relationship is very useful for determining the limiting state probabilities 
(also called steady state probabilities) of the process. In the firing practice 
example 



(a 0 otO 



= (<*o «i) 



\a 0 + i<xi = 0 
koLo-ioci = 0 



(2.83) 
(2.84) 



These two equations are identical, therefore an equation of the following 
form can be used 



a 0 + 0i x = 1 
From Equations (2.83) and (2.85) 
<*o = 3 

and 



(2.85) 



It can be seen that these values could also be obtained by multiplying P, 
large number of times. 



The Preliminaries 41 



Time Specific Behaviour 

It has been shown that the w-step probability distribution of the discrete Markov 
chain can be found from f 1 where P is the transition matrix. In determining 
higher powers of P, the following matrix product is often useful 



»n-m nm 



(2.86) 



The multiplication of large matrices is quite unwieldy using hand calculations 
but easily accomplished when a digital computer is used. Though this method of 
matrix multiplication is quite useful, the following technique can be used for 
very large powers of P. 

If the matrix P has N distinct real eigenvalues, then it can be proved that 
there exists a matrix S having an inverse S~ l such that 

SPS~ l = D (2.87) 

The matrix P is then said to be similar to the diagonal matrix D. In the 
diagonal matrix all but the diagonal elements are zero. The diagonal elements 
of D are the eigenvalues of P and can be determined from the following 
relationship. 

det(P-dO = 0 (2-88) 
Equation (2.87) can be rearranged as 

SP = DS (2-89) 

The row s ( - of S is termed the ith left eigenvector associated with the eigenvalue 
d-. Similarly rearranging (2.89) 

PS' 1 - S-'D (2.90) 

The jth column vector of 5" 1 is termed the ;th right eigenvector of S. It can be 
seen from (2.87) that 

SPS^SPS' 1 - SP 2 S- 1 = D 2 
and by induction 

pn = S ~l D n S (2.91) 



The nth power of P can therefore be found from the nth power of the diagonal 



42 Sys tern R eliab ility Modelling and Evaluation 



matrix which is easy. The difficult part, however, is to determine the eigenvalues 
and the associated eigenvectors of P. Several numerical techniques are available 
for determining these elements. In many practical problems, the basic matrix 
multiplication technique is quite adequate. The matrix algebra approach to the 
firing practice problem is as follows 



P = 



3 3 



Therefore 



(p-di) 



The eigenvalues can now be found by equating the determinent to zero. 



= o 



4 \~d 



4d 2 -5d+ 1 = 0 

The roots of this equation give the eigenvalues of P 

d 0 = 1 
di = i 



Therefore 



D = 



1 0 

0 I 



The next step involves determining the left and right eigenvectors of P. 
From Equation (2.89) 



The Preliminaries 43 



|soo + 4soi = Soo 

i.e. 

-isoo + i*oi = 0 (2-93) 

Also 

i«io + 4*11 = o ( 2 - 95 ) 

and 

isio + isn = 0 (2.96) 

Equation (2.93) and (2.94) are identical, as are Equations (2.95) and (2.96). 
There are now two equations and four unknowns and therefore the magnitudes 
of the eigenvectors cannot be uniquely determined. Assuming 



Soi = h 
sio = -1 



Therefore 




It should be noted that the elements of S, in general, are not determined 
uniquely. Each row of *S is determined up to a multiplicative constant. S" 1 can 
be found by inverting S 



Using (2.91) 



pn = 



or o 
o ar 
i-ittr' 



44 System Reliability Modelling and Evaluation 



For n = 3 



P 3 = 



as previously found by the matrix multiplication approach. Similarly 



pn = 



First Passage Times 

One parameter of interest in many Markov Chain problems is the time to 
encounter a state for the first time. This is called the first passage time. If this 
state is an absorbing state or has been made an absorbing state, this is called the 
time of absorption. In reliability engineering this concept is used to calculate the 
mean time to first failure, MTTFF. As noted earlier, almost all the cases of 
practical interest are regular chains, i.e. chains in which all the states communicate 
and which are not cyclic. In these cases, the mean first passage times and their 
variance can be obtained from the fundamental matrix Z defined as below 



Z = [I-[P-A]]- 1 (2.97) 

where 

/ is the identity matrix 
P is the transition matrix 
and A is the matrix each row of which is the limiting probability vector 

The mean first passage time matrix T is given by 

f = [I~Z + UZ d ]D (2.98) 
where _ 

T is the mean first passage time matrix such that represents the mean 

time or mean number of steps to go from state i to; 
U A unit matrix, i.e. with all entries 1 



The Preliminaries 45 



Zj Matrix resulting from Z by setting off-diagonal elements equal to 
zero 

D Diagonal matrix such that d- = l/a z - 

The variance of the first passage times can also be explicitly determined. 
Denoting the first passage time from state i to / by t^, define the matrix W 

W = (£■(*&)) 

This matrix can be computed from the fundamental matrix by 

W = f(2Z d D - 1) + 2 [Zf - U(Zf) d ] (2.99) 

(ZT) d is a matrix obtained by setting the off-diagonal elements of ZT 
equal to zero. 

The variance can now be obtained using Equation (2.13) 
V(t u ) = EitD-iEiUi)? 

= Wij-t? 

Example: A discrete Markov chain has the state transition diagram shown 
below. Find the matrix of the mean first passage times. 




The transition matrix for this state space diagram is 



46 System Reliability Modelling and Evaluation 
The vector a can be computed by 



4 I i 
4 i i 



(<*1 «2 «3> 



J * ij 

On solving these equations along with a, + a 2 + a 3 = 1 
<* 3 = A 



Therefore 



5 T5 

i A ft 
1 ft ft 





1 


-T5 


1 

16 


/-[i>-,4] = 


0 




-ft 




0 







Z can be now determined from the inverse of the above matrix 





1 






-A 


z = 


0 






i 

"re 




0 


-T5 








2 


0 


0 




D = 


0 




0 






0 


0 







77*e Preliminaries 47 



Substituting into (2.98) 





1 




18 
T3 




1 


1 


1 




1 


1 


1 




2 




6 




2 








2 


16 
T 


16 

T 



2 


0 


0 


0 


16 
T 


0 


0 


0 





Alternative Approach to First Passage Times 

The technique for calculating the mean and variance of first passage times for 
regular Markov chains has been illustrated. It is also possible to calculate these 
quantities by making state / an absorbing state and applying the theory of 
absorbing chains. An absorbing chain is one which once entered cannot be left. 
The behaviour of the stochastic process before once hitting state / will be the 
same as that of the original process. The first passage time from state i to state / 
is now the time of absorption starting from state i in the new process. The basic 
results for this absorbing chain can be obtained from the fundamental matrix N 

N - [I-QY 1 (2-100) 

where 

N = The fundamental matrix whose « ^ denotes the mean number of 
times the process is in state k before absorption, the process having 
been started in state z. 

Q = The matrix obtained by deleting the jth. row and the /th column from 
matrix P of transition probabilities. 

The mean first passage time from state i to / is therefore 

N-l 

U = Z n ih 
fe=i 



The variance column vector is given by 
W = [Jn-I]!-!, 



(2.101) 



48 System Reliability Modelling and Evaluation 

The variance of the first passage time from state i to state / (the one 
made into an absorbing state). 

The column vector such that t- is the mean first passage time from 
i to /. 

The column vector with t si = t? 

It can be seen that this approach gives additional information about the mean 
first passage time by providing the components spent in various states before 
once hitting state /. This method can be illustrated by application to the previous 
example. Determine the mean first passage times from states 1 and 2 to state 3. 
Truncating the third column and row 



where 




The fundamental matrix N can be found from its inverse 




Starting in state 1 , the process visits states 1 and 2, 4 and 2 times before first 
hitting state 3. Therefore 

f 13 = 4 + 2 = 6 

and 

= !+§ = 5-333 



It can be seen that these entries agree with the elements of T found 
previously. If state 3 was considered to be the failed state of the system, then 
MTTFF is 6 when state 1 is taken as the initial state. 



Continuous Parameter Markov Chains 

Many of the problems encountered in system reliability can be modelled using 
continuous parameter Markov chains. The next chapter examines frequency 
balancing techniques as an alternative way of looking at the stochastic process. 



The Preliminaries 49 



For u < v < t, the Markov property for a continuous parameter Markov chain 
would be 

P(Z(t) = k | Z(v) = j, Z(u) = 0 = P(Z(t) = k | Z(v) = /) 

This property is basically of the form 

P(Z(t + x)=f\Z(t) = i) 

and is termed as the probability of transition from state i to state / during the 
time interval t to t+x. If this transition probability does not depend on the 
initial time t but only on the elapsed time x, then the process is said to be time 
homogeneous. This book is primarily concerned with this class of process. The 
transition probability will be denoted by 

Pu (x) = P(Z(t + x)=j\Z(t) = i) 

for any x. The Chapman— Kolmogorov Equation (2.75) can now be written as 

Pij(t + x) = I p ik (t) PkJ (x) (2.102) 
ft 

The transition probabilities must satisfy the following conditions 

Q< Pij (x)<\ (2.103) 

and 

!>,•(*)< i < 2 - 104 ) 



In Equation (2.104), if 2 Pij(x) = 1 for all i and x, then the process is 

called honest but if the inequality holds then there is non-zero probability of the 
process escaping to infinity and such a process is termed dishonest. This book is 
concerned only with honest processes. In the case of discrete parameter chains, 
the basic elements are the one step transition probabilities. In a continuous 
parameter case the equivalent elements are the limiting values, i.e. as x 0. Define 
the transition intensity or rate as 

dx L 0 



hm «^ 

A*->0 Ax 



50 System Reliability Modelling and Evaluation 
i.e. 

Pij(Ax) = X y Ajc + 0(Ajc) (2.105) 
for i = j 

dp u (x) 

x=0 

Pii (Ax)-\ 



dx 



lim 



Ajc^o Ax 
i.e. 

Pu(Ax) - \ u Ax + 1 + O(Ajc) (2.106) 
Differentiating both sides of (2.104) for equality and setting x = 0 

i.e. 

hi ~ ~ L ^ij 
Therefore 

P«(Ajc) =1-1 hjAx + 0(Ax) (2.107) 



In Equation (2.105), p^ (Ax) represents the probability of transiting from 
state i to state /during the interval of length Ax and this is equal to Ax 
plus a term which when divided by Ax tends to zero as Ax -> 0. Equation (2.107) 
can be interpreted in a similar manner. Equation (2.102) can now be written for 
a small increment of time At as 



Pii (t + AO = Z P ik {t)p h j{At) 
k 

= Pi j(t) PjJ (At)+ £ Pik (t)p kj (At) (2.108) 

where 

Pij (t) = P(Z(t)=j\Z(Q) = i) 
Substituting from (2.105) and (2.106) 

Pij(t + At) = PiJ (t)(l + Xjj At) + £ p ik (t) \ kj At + 0(At) 



The Preliminaries 5 1 



A/ fe^j 

and as Ar ->• 0 



If P z <f) denotes the row vector whose /th element is p z y(r), i.e. the probability of 
being in the /th state at time t given that the process was initially in state /, then 
the above equation can be written as 

P[(t) = Pi(t)R (2.109) 

where R is the transition rate matrix whose //th element is In a more general 
form Equation (2.109) becomes 

P'(t) = P(t)R (2.110) 

where P(t) has p-- (t) as its (i/)th element. The initial condition for (2.1 10) is 

P(0) = I 

If, however the initial state of the system is defined by a probability 
distribution in the form of a row vector p(0), the distribution at t is given by 
p(Q)P. The system of equations (2.1 10) is termed as the system of forward 
equations. 

At this point it is interesting to probe a little into the significance of the 
transition rates. Let X^- be a random variable defining the duration of state k 
under the condition that the next transition will be to state /. In accordance 
with Equation (2.7), the hazard rate is 

P[x<X kj <x + Ax\x<X kj ] 
<M*) = lim — 

AX 



P[x<X kj <x + Ax\x<X kj ] = <p kj (x)Ax + Q(Ax) 



The left hand side can be interpreted as p k j(Ax) if the process has been in 
state k for time x. If the process is to be Markovian then ^fx) must be 



5 2 Sys tern Reliab ility Modelling and Evalua tion 



independent of x as the process is independent of the past. Therefore 

P k j(Ax) = (f) k jAx + 0(Ax) 
Comparing with (2.105). 

htj = <Pkj 

That is the transition rate X^- is the hazard rate of the random variable defining 
the duration of state k under the condition of transiting to state /. The 
exponential is the only distribution having a constant hazard rate and therefore 
the random variables underlying the time homogenous Markov process must be 
exponentially distributed. 

Although time homogenous Markov Chains are the main interest in system 
reliability, there is no additional difficulty in extending the above arguments 
to transition rates which are functions of system time, i.e. when X^- is \j(t). 
The process, however, becomes non-Markovian when the transition rates are a 
function of the state residence times. These processes are treated in Chapter 6. 
In the case when the transition rates are functions of system time t there exists 
a family of matrices P(u,t), for t > u whose elements are 

p u (u,t) = P(X(t)=j\X(u) = i) 

In this case however the transition probability depends not only on (t - u) 
but on u as well. The system of forward equations can now be written as 

^|^ = f(«,0«(0 (2.111) 

This equation is called the Kolmogorov differential equation. 



Transient Behaviour 

Equation (2.1 10) is a system of linear differential equations with constant 
coefficients. If the eigenvalues of R are distinct, the solution of Equation (2.110) 
can be easily obtained in the form 

P{t) - SD(t)S- 1 (2.112) 
where 

D(t) = The diagonal matrix whose («)th element is exp { r-t |, r i being the 

rth eigenvalue of R 
S = The matrix formed by right eigenvectors of R 



The Preliminaries 53 

S~ l = The matrix formed either by inverting S or from the left eigen- 
value of R 

The proof of Equation (2.112) may be found in books on differential equations. 
In practice if t is short, the solution may be found by the following technique. 

i>'(0 = P{t)R 
As At -> 0 + 

P{t + At) = P(t) + P'(t)At 
= P(t)(I + RAt) 

It time t is divided into a very large number of equal intervals At, so that At is 
very small (— 0), the above expression can be written as a recursive relationship 

P(jAt) = PiT 1 ! At)[I + RAt] (2.113) 



It should be noted that (2.1 13) implies the approximation of a Markov 
process in continuous time by a discrete time Markov process with steps 
equal to At. The (z/)th element of (/ + R At) is X^- At, i.e. the probability of 
transiting from state i to state / in one step of length At. Therefore [/ + R At] 
is a one step transition probability matrix. It can also be seen that 

P(jAt) = [I + RAtV 



which is the matrix multiplication technique in the discrete time case 



Equilibrium Probability Distribution 

As t -> °°, the probability distribution of Z(t) tends to an equilibrium distribution. 
For all processes having a finite number of mutually communicating states, the 
unique solution can be found by solving (/V-l) equations from 

pR = 0 (2-H4) 



and 

E Pi = 1 (2.115) 

i 

where p is a row vector whose (th element pj is the steady state probability of 
being in the zth state. It can be proved that p i equals the expected value of the 
proportion of a long realization spent in state i. In most cases the steady state 



5 4 System Reliability Modelling and Evaluation 
probabilities are the only quantities of interest. 



Example: One of the processes commonly encountered in reliability studies 
is the two state Markov process. The state transition diagram for this process is 
shown below. 1 



Fig. 2.8 Two-state Markov process 



The transition rate matrix 



R = 



X X 
L V ~V 



Taking the Laplace of Equation (2.1 10) 
sP(s)-P(0) = P(s)R 

i.e. 

P(s) = P(p)[sr -r]- 1 
For the two state process 



m = p{o) 



s + X -X 
-M s + ju 



s + ju 



s(s + X + fi) [ ^ s + X 



The Preliminaries 55 
The probability vector p(t) of the state probabilities can be obtained by 

(Po(t)Pi{t)) = (po(0)pi(0M0 

Therefore 

PoW = rz-(Po(0) + Pi(0)) + (po(0)r^ — Pi(o)rf-) e_a+M)f 

M 1 e -(\+M)f 

*_| e -(\ + M)f 



and 



+ Po(0) 



Pi(0 



X+ju 

xVl"® X 



(2.116) 
(2.117) 



As f 

Po(0 = Po 



and 



Pi(0 = Pi = 



X + /i 



X + m 



The steady state solution can also be obtained by the application of Equations 
(2.1 14) and (2.115) 



(Po Pi) 



X X 

M -ju 



Therefore 

-Xpo + HPi = 0 



and 



APo-MPi = 0 



One of these identical equations can be used with 

Po+Pi = 1 
to give 



Po = 



and 



X + ju 
X 



5 6 System Reliability Modelling and Evaluation 

The probabilities p Q and Pj are independent of the initial condition. This 
seems to be intuitively true because after a long time many transitions between 
0 and 1 would have taken place and therefore the effect of the initial condition 
tends to diminish. The probabilities p 0 and pi can be interpreted in two ways. 
The first interpretation can be in terms of an average taken over a large number 
of the realizations taken at a single point in time. If out of n realizations, the 
process is n Q times in state 0 at a time t remote from the time origin then 

"o 

Da = 



The second interpretation is in terms of the limiting proportion of time spent 
in state 0 in a single long realization. The parameters X and /i are the hazard rates 
of exponential distributions and therefore they are the reciprocals of the mean 
time spent in state 0 and state 1, i.e. 

1 



M = 



E(X 0 ) 
1 



where X Q and X x are the random variables denoting durations of 0 and 1 state. 

Considering In transitions in a single long realization of the stochastic 
process, the process will be n times in state 0 and n times in state 1. Therefore, 
the proportion of time spent in state 0 in 2n transitions is 

r>(n) _ ^01 + ^02 + ■ • • X 0 n 

K o — — 



(X 0l +X 02 + . . . + X 0n ) + (X n +X l2 + . . .X ln ) 
Dividing both numerator and denominator by n 

R (n) = x o _ 
° Xo+X, 

As the number of transitions tends to be large, the average values tend to the 
means (law of large numbers) and therefore 

,(„) E{X 0 ) 



E(X 0 ) + E(X 1 ) 



The Preliminaries 57 



X + M 
= Po 

Therefore p Q is the limiting proportion of the time spent in state 0 in a single 
long realization of the two state stochastic process. A similar interpretation 
holds for state 1 . 



First Passage Times 

Denote the first passage time from state i to state /* by T-j, i.e. this is the time 
to enter state / for the first time starting in state i. If the state ;' is now made an 
absorbing state, the behaviour of the new stochastic process and the original 
process is the same until meeting; for the first time. If p.. (t) is the probability 
of being in state /, starting in state / for the new process then 

P(T u <t) = p u {t) 

The probability density function (t) can be found by differentiation 

/ i XO = |«r„<o) = |p y (0 

The Laplace transform can be obtained by 

/ tf (s) = spyis) (2.118) 

The bar indicates a Laplace transform. After evaluating the right hand side, the 
explicit density function can be obtained by inversion. The moments of the 
first passage times can be found by referring to Equation (2.43). 

The kth moment of the first passage time can be found by differentiating 
Equation (2.43), k times 



If the absorbing state is the failed state, then the mean first passage time 
represents the MTTFF. The above procedure can be conveniently carried out in 
the matrix form. Let the states 1 to / be the elements of subset X + and 7+1 to 



58 System Reliability Modelling and Evaluation 



Nbe the elements of X~. It is required to find the first passage time to the 
subset X~. The matrix of transition rates can now be partitioned as follows 



R = 



R 2 \ 



R\i 

Rn 



where 



and 



R n is slJ x / matrix 

R l2 is a/ x (N — J) matrix 

R 2l is & (N —J) x / matrix 

R 22 is 2l(N — J) x (N—J) matrix 



The states jeX~ are now absorbing states and therefore R 2l and R 22 are set to 
zero. Let p(t) be the vector of state probabilities for an initial starting condition. 
This vector can be expressed as (p + (t) p_(t)) where p + (t) and p_(t) are the vectors 
containing the states ieX + and ieX~ respectively. The forward differential 
equation now becomes 



0 



0 



-( P+ {t)pXt)) = (p + (t)P-(t)) 

Therefore 

p'w(t) = P+ (t)R n 

and 

p'.{t) = p + (t)R 12 
Taking the Laplace transforms 
sp + (s)-p + (0) = p + (s)R n 
sp.(s) = P+(s)R l2 

P-(0) - 0 as the process started in iGX + . These equations can be rearranged 



P + (s) = P+ (0)[sI-R n ]- 



and 



(2.120) 
(2.121) 



The probability of being in subset X~ at time t is p_(t) U^-_ k where is a 
unit vector of dimension N-k. From Equation (2.118) it can be seen that the 



The Preliminaries 59 



Laplace transform of the probability density function of the first passage time is 

7(s) = sp-(s)U N -u 

= p + (0)(sI — R n )~ 1 Ri2U N -k 

Since the rows of the transition rate matrix sum to zero 

RnUtf-k = Rn Uk 
Therefore 

f(s) = pM[sI-RuVRnU k (2.122) 

The rth initial moment can be found by Equation (2.119) 

T (fe > - k\ P+ (0)(-Rnr k U k (2.123) 

The mean is 

f - r (1 > = pMi-RupU* (2.124) 

If the process started in the first state 
f = (1 0 0 ... OX-RnY 1 ^ 

HX~ represents the failed condition T, then T is the MTTFF. It should be 
realized that Equation (2.124) can be derived from the theory of discrete time 
Markov chains by assuming that each step of the chain is At - 0. The matrix of 
one step transition probabilities becomes [I + R At] and by truncating the 
abosrbing states Q= [I + R At] and therefore the fundamental matrix 

N = [I-Q] _1 = ^E-Kn]- 1 

This matrix gives the number of steps spent in the different states. The time 
spent in the different states can be obtained by multiplying by Af i.e. the 
step length. From this point on it is easy to see that 

f = pMi-RnT'u, 

The first passage time represents the time of entering a state or a set of 
states for the first time, starting in a particular state. It is sometimes necessary, 
however, to find the mean time spent in subset X + or X'. For example, if 
X + and X~ represent the up and down states respectively, these time 



60 System Reliability Modelling and Evaluation 



parameters represent the mean up time and the mean down time. In order to 
calculate these quantities, it is necessary to know the probabilities of beginning 
X in the various states, which are its elements. Denoting the steady state 
probabilities of being in various states of the original process by p-, the 
probability of beginning X + in state / is 

In vector form 

P-R21 



P-R 21 U k 

In the steady state 

p + R n +P-R21 = 0 

Therefore 

p+ (0) = -P+ R n = -P+Rn 
p + R n U h p + R 12 U N . k 

Substituting in (2. 124), the mean stay in X + is 



■p+Ru(-R u r l U k P+ U } 



p + R n U N . k P + RnU N , k 



/i£x + /Gr 

Z Pi/ Z/vZ X iJ (2-125) 
*ex + / f €x- ;gx + 



In a similar manner the mean duration in state XT 



T~ = Z ft / 1 ft I X tf 

iGX" / iGX" + 

= Z A / Z A- Z *«/ (2-126) 



The Preliminaries 61 



The mean cycle time, i.e. the time between two successive encounters of 
X + oxX~ 

T = T + + T~ 

= 1 / Z Pi Z hj 

= 1/ Z Pi Z *(/ (2-127) 

In the next chapter these relationships will be derived from the frequency 
balancing technique. 

Exercises 

1 . X is a non-negative continuous random variable such that conditional on X 
being greater than a fixed value t > 0, the probability density function of 
X - 1 is the same as the unconditional probability density function of X. 
Prove that X has a negative exponential probability density function. 

2. Assume that X is normally distributed with m - 0.4 and a=4, find 

(a) P(X>l-5) 

(b) />(X<0-5) 

(c) P(-3<X< 1) 



3. Suppose that Xp i = 1, 2, . . . n are independent random variables, gamma 
distributed with parameters a x , p. Prove that the random variable 

n n 

2 ^ is also gamma distributed with parameters 2 a - and p. This is 

i=l i=l 

called the reproductive property of gamma distribution. 

4. Find the 1, 2, 3 and 4 step transition probability matrix for the follow single 
step transition matrix. Does it exhibit any special characteristic? 



2 2 
0 0 
0 0 



62 System Reliability Modelling and Evaluation 



5. The state transition diagram of a continuous time Markov chain is given 
below. The states 1 and 2 are working states and state 3 is failed state. 
Calculate 

(a) The availability, i.e. the steady state probability of being in the working 
state 

(b) MTTFF 

(c) Mean cycle time 




References 

1. U. Norayan Bhat, Elements of Applied Stochastic Processes, John Wiley 
(1972). 

2. R. Billinton, Power System Reliability Evaluation, Gordon and Breach (1970). 

3. J.A. Buzacott, Markov Approach to Finding Failure Times of Repairable 
Systems, IEEE Transactions on Reliability, R-19, 4, (1970). 

4. A.B. Clarke and R.L. Disney, Probability and Random Processes for 
Engineers and Scientists, John Wiley (1970). 

5. D.R. Cox and H.D. Miller, The Theory of Stochastic Processes, Methuen (1965). 

6. W. Feller, An Introduction to Probability Theory and Its Applications, 
1. (Third Edition) John Wiley (1958). 

7. J.G. Kemeny and J.L. Snell, Finite Markov Chains, Van Nostrand (1960). 

8. E. Parzen, Stochastic Processes, Holden-Day (1962). 

9. G.H. Sandler, System Reliability Engineering, Prentice-Hall (1963). 



CHAPTER 3 

Frequency and Associated Concepts 



Introduction 

This chapter develops the frequency balancing approach to the stochastic 
process. This method has been used with considerable success in the field of 
power system reliability evaluation and is known as the frequency and duration 
approach. The concept of frequency is examined in detail and used to derive 
expressions for the mean cycle time and the mean duration of a state. In 
Chapter 5 the concept of frequency is used to derive conditions of mergeability 
which are very useful in dealing with large systems. 



Interstate Transition Rate 

Consider a large number of identical systems having the underlying stochastic 
process Z(t). All the systems start in state i and further assume that in one step, 
state i can communicate with either state j or state k. If the histories of all the 
systems are now plotted they may appear as shown in Fig. 3.1. These 
observations are called independent realizations of the stochastic process Z(t). 
Further, if N realizations in which the system transits to state /' are separated out, 
the durations of the system in state i in this subset are called TV independent 
realizations of the random variable X^, i.e. the duration of the state i under the 
condition that the system will transit to state /. In the stochastic process where 
all the states intercommunicate, these realizations could be obtained from a 
single long realization of the system by observing from the moment of each 
entry into state i until the termination of state i by transiting to state /. 



Define: 



The probability density function of the random variable 
The distribution function of the random variable X {j 
P(X u <x) 




X 



64 System Reliability Modelling and Evaluation 
and 

Sij(x) = The survivor function of the random variable X f j 

= f x My)dy 

Now introduce the random variable 

n(x) = The number of realizations in which the system is in state i 
at time*. 

Then 

E{n(x)} - The expected value of n(x) 

= N^f ij (y)dy (3.1) 
= V(x) 

The expected number of transitions to state / in the interval (x, x + Ax) 
is given by 

Ar)(x) - f}(x) — t?(x 4- Ax) 



/ k 
• 



Origin Time 

Fig. 3.1 . Independent realizations of the stochastic process Z(f). 



Frequency and Associated Concepts 65 



and the rate of transition from state / to state /, per realization surviving up to 
x is 

yv 77(x) Ax 

Ax -> 0 + 

1_ drjjx) 

ri(x) dx 



(3.2) 



Substituting the values of ri(x) and from (3.1) 

dx 

X,W = gg (3.3) 

The transition rate from state i to / is therefore the same as the hazard rate 
associated with the random variable X^. The above derivation is useful in getting 
an appreciation of the time specific transition rate as a relative expected rate. It 
is, however interesting to transform Equation (3.3) into another form. 

> M - IiM 
Xii(X) - S u (x) 

A^o + Ax Sij(x) 

lim Pjx<X u <x + Ax ) 1 
~ Ax P{x<X t j) 

P(x < X u <x + Ax\x< X u ) P(x < X u ) 
~ a*™ 0 * Ax P(x<X u ) 

= Um PjxKX^x + AxlxKX,) 
Ax-*o* Ax 

That is, as Ax -> 0 + 

\ u (x)Ax = P(x<X ij <x + Ax\x<X iJ ) 

= The probability of transiting from state i to state / in the 
interval (x + Ax) given that this transition has not taken 
place up to time x 

= The probability of a single transition from state i to state / 
at the age x of state i 



66 System Reliability Modelling and Evaluation 

Therefore, as Ax 0 + 

The expected number of transitions from state / to state / in the interval 
(x f x + Ax) 

= \.P(x<X ij <x + Ax\x<X ij ) 
= \ u (x)Ax 

i.e. \jj(x) = The expected or the mean transition rate from state i to 
state / at the age jc of state /". 

The quantity \j(x) is called the age specific transition rate and under particular 
conditions may be designated as the failure rate, hazard function, etc. The 
concept of the interstate transition rate will be further treated in the next chapter 
while dealing with the state transition diagram. WhenX z yis exponentially 
distributed with probability density function p exp j -p jc|, the age specific 
transition rate, as shown in the last chapter, is constant, i.e. 

\j( x ) ~ hi = P — ~ 

Mean value of X u 

The transition rate is, therefore, constant, i.e. independent of the age of the 
state and is equal to the reciprocal of the mean of the random variable X». This 
property is true only for the exponential distribution when all the random 
variables generating the stochastic process are exponentially distributed, the 
transition rates are constant and the process is Markovian. The case of constant 
transition rates will be treated in detail in this chapter. The discussion of non- 
Markovian processes is deferred to Chapter 6. 



The Concept of Frequency 

The state space X of the stochastic process Z(t) is assumed to be partitioned. into 
two disjoint subsets and X~. If any state of the subset is entered, that subset 
is said to have been encountered. 
Define : 

f+(t) = The time specific frequency of encountering the subset X + . This is the 
expected rate at which is encountered at time t. For At -»• 0 + ,f + (t)At 
represents the expected number of times X + is encountered in the interval 
(t, t+At). 

Ejj(t) = The time specific frequency of encountering state / from state i. This is 
the expected rate at which state / is encountered from state i or the mean rate 
at which the system transfers from state i to state / at time t. 



Frequency and Associated Concepts 67 

pft) = The probability of being in state i at time t, for the given initial 
condition. 

P + (t) - The probability of the system being in X + at time t for the given initial 
conditions. 

= I Pi(0 



A^- = The transition rate from state / to state /. 

State / is encountered from state / at a rate \.. if the system is in state i, but if 
the system is not in state i then this rate is obviously zero. The transition rate 
or the encounter rate can therefore be represented by a discrete random variable 
fyjti) such that 

(x„ifZ(f) = i 

EiM = hjP{Z(t) = i } + 0 .P {Z(0 * / } 
= hsPti) 

Since the states are all mutually exclusive 

= I X PtiOhj (3.5) 

If there is only one state / in X + , then the time specific frequency of 
encountering it is 

fj(0 = Z Pi(0\j (3.6) 
t*J 

Similarly, denoting the expected transition rate from / to ieX~ by Ej(t) 

W) = Pj(t) I hi 



of 



As At -*■ 0 + ,fj{t)At 3indE{t)At represent respectively the expected number 
transitions into and out of state / in the interval (t, t + At). Since the 



68 System Reliability Modelling and Evaluation 

probability of more than one transition in (f, t + At) can be reasonably assumed 
as Q(At), the quantities fj(t)At and E^t)At can be interpreted as the probabilities 
of a single transition into and out of state /. The difference \f(t)At - E^{t)At] 
therefore represents an increase in the probability of being in state /, i.e. 



Ap,(0 =mAt-Ej(t)At 
As At -» 0 + 



dt 



(3.7) 



This can be recognized as the forward differential equation of state /. In matrix 
form 

Ap(t) = p'(t) (3.8) 

where 

p{t) = The column matrix whose z'th value p^t) represents the probability 

of being in state / at time t for the given initial condition. 
p'{t) = The differential of p(t) 

A = The transpose of the transition rate matrix used in the Markov 
approach. 

The frequency balancing approach has been used to write the forward 
differential equations for the system and can be extended to derive expressions 
for the system reliability indices both in the time specific as well as the steady 
state domains. 



Time Specific Domain 

It is often necessary to examine the probable system performance over a finite 
interval of time. It is usual in the literature to define reliability indices in terms 
of system success or system failure. However, in many complex systems, there 
may be more than one degraded mode of failure. For example, a large chemical 
plant may not be just up or down but may have many possible capacity states. 
This is also true of transportation systems which may not be just available or 
unavailable and may have degrees of availability or unavailability. It is therefore 
appropriate to define reliability measures in terms of a subset X + where this 
subset may contain just one state or several states of the system. In particular 
applications the state will be referred to as success, failure, or some other 
appropriate name. In the transient domain the following indices are commonly 
used for repairable systems: 



Frequency and Associated Concepts 69 



1 Time Specific A vailability of Subset X + 

This is also designated in the literature as pointwise availability or instant 
availability and is the probability of the system being in any state contained in 
subset X + at a particular instant of time t. This will be denoted by A + (t) and 
since all states of the system are mutually exclusive, i.e. separated in time 

A + (t) = I Pi (t) (3.9) 
/ear 



The probability of being in state i at time t can be found from Equation 
(3.9) and has been discussed in Chapter 2. When X + is constituted by the 
system states which denote system failure, this can be called unavailability 
of the system at time t and designated by U(t). This is probably the most 
widely used index in the transient domain. 



2 Fractional Duration of Subset X + 

The fractional duration of subset X + in the interval (^ , t 7 ) is also known as the 
interval availability or average time in X + and is defined as the expected 
proportion of the interval (^ / 2 ) spent in X + and denoted by D + (t l J 2 ). Since 
the states of the system are separated in time, the expected duration of X + is 
the sum of the expected durations in the states constituting X + , i.e. 



1 ^ rt 7 %Mt)dt 
D + (t u t 2 ) = I Pi (t)dt = 

T 2 ti i<EX + J*i 



t2 ~t\ 



(3.10) 



Equation (3.10) can be understood by considering probability as a relative 
duration. The probability of being in X + at time t, i.e. A + (t) can be considered 
constant over the interval (t, t + At), as At -> 0 + . Since A + (t) can be considered 
constant over the interval, in a very large number of realizations of the associated 
stochastic process, on the average A + (t)A t time will be spent in X + during 
(t, t+At}. Considering (f t ) to be divided into m increments of At 



D + (t x ,t 2 ) 



I A + {t)At 

m 

l ->06 

frAMdt 



/(t2-ti) 



tl -fl 



70 System Reliability Modelling and Evaluation 

Another interpretation of Equation (3.10) can be provided by denoting the 
state occupied by t by Sfit) such that 

Si(t) = 1 if the system is in state / 
= 0 if otherwise. 

The proportion of (fi , t 2 ) spent in X + is, therefore 

T + (ti,h) = Z 

f 2 - h iex* J t, 

where the right hand side can be regarded as the limit of a time average taken 
at points (0, At, 2At, 3At, . . . ) as At 0. Therefore, the expected value 
D + (h,t 2 )=E{T + (ti,t 2 )} 

-r4rZ C 2 E{S i (t)}dt = -- l ~j £ ( t2 P{ Si (t)=l}dt 

1 ^ Ch Sl*A + (f) 
= 7— I PiiOdt = ±— 

r 2 *l iGX + t 2 — ti 

3 Interval Frequency 

The interval frequency F + (t x ,t 2 ) is defined as the expected number of times the 
subset is encountered in the interval ,t 2 ). Since the subset X + is said to 
have been encountered once, if the system transits from X~ to X + , F + (f 1 ,t 2 ) 
represents the expected number of transitions from X~ to X + and not from X + 
to X". In this treatment the state transition rates are assumed constant which 
puts the stochastic process in the Markovian class. Similar treatments can, 
however, also be made for non-Markovian processes. 

The interval frequency can be obtained by integrating the Expression (3.5) 
for time specific frequency over the interval, i.e. 

F+(f lt t 2 ) = j^ 2 f + {t)dt 

= Z P P&) I hjdt (3.11) 

Sometimes the interval frequency may be divided by the interval length to 
obtain average interval frequency. 



Frequency and Associated Concepts 71 



Methods of Calculation 

When the state space of the system is small, it may be possible to find explicit 
expressions for the interval frequency and fractional duration. When the state 
space becomes relatively large, this approach is not feasible. Methods for obtain- 
ing time specific probabilities have already been described in Chapter 2. This 
section extends those methods to the calculation of interval frequency and 
fraction duration. 

Method 1 

The time specific state probabilities can be found by solving the following 
differential equation in matrix form 

no = pw 

with the initial condition P(0) = I, i.e. the identity matrix. 

Here, P(t) = The matrix whose (i/)th term Pifjt) denotes the probability of 
being in state / given that the process was in state / at t = 0 
R = The transition rate matrix. 

It was shown in the last chapter that if R has distinct eigenvalues then the 
probability matrix can be expressed as 

P(t) = SD(t)S~ l (3.12) 
where 

D(t) = The diagonal matrix whose (/7)th element is exp (r,f), r t being 
the ith eigenvalue of R 

S,S~ l = The matrices formed from the right and left eigenvectors of R 

If the distribution at t = 0 is given by the row vector p(0), the distribution of 
t is given by 

p(t) = p(0)P(f) 

The ith element of the row vector p(t) is denoted by Pi(t) and represents 
the probability of being in state i at time t for the given condition at t = 0. 
After finding Pi(t), A +(t) can be calculated using Equation (3.9). 

Fractional duration 

Since only D(t) on the right hand side of (3.12) is time dependent 



72 System Reliability Modelling and Evaluation 
j* 2 p(t)dt = p(0)S £ 2 D(t)dtS~ l 

= p(0)5Af(f 1 ,r 2 )5- 1 (3.13) 

where 

th (e r '^-e r 'M 
M(ti,t 2 ) = The diagonal matrix whose (ii) term = 

n 

Equation (3.13) thus yields f. 2 Pi(t)dt and by substituting these values 
in Equation (3.10), D + {t x , t 2 ) can be calculated. 



Interval Frequency 

When the quantities / p t {t)dt have been calculated by Equation (3.13), these can 
be substituted in (3.1 1 ) to determine F + (t ,t ) 



Method 2 

If the initial state of the process is known, the differential equation becomes 
pit) = p(t)R 

The initial row vector p(0) is such that its /th element p^O) is equal to 1 if the 
process started in the /th state, otherwise it is 0. The process can be segmented 
into steps of a very small length A t as shown in the last chapter and the 
probability vector at time t=j'At is 

pijAt) = p(F=lAt)[I + RAt] (3.14) 

The state probabilities can be calculated by repeated application of the above 
recursive relationship. 



Fractional Duration 

The fractional duration, using this technique, can be calculated using a discrete 
time equivalent of (3.10). Assuming p{Af) to be constant over the interval 
(jAt,(j+\)At) 



Frequency and Associated Concepts 73 

1 m 

Dr(0,t) = ' I p(jAt)At (3.15) 
t j=o 



where 



Dr(0, t) = The row vector whose Z 01 entry, ^(O, t) represents the 
fractional duration of state / in the interval (0, t). 



Therefore 



D + (o,t) = I A(o,0 



Interval Frequency 

The interval frequency in the interval (0,/) can be evaluated using a discrete time 
approximation of (3.1 1), i.e. 

^(0,0 = t I A(0,0 Z Xff (3-16) 



The above expression is readily understood by realizing that t D£0,t) denotes 
the expected duration in state /, which when multiplied by the constant 
transition rate yields the expected number of transitions. 



Steady State Domain 

In many applications the time interval under consideration is very long, the 
stochastic process is remote from the time of origin and therefore the probability 
distribution has reached statistical equilibrium or is in a steady state condition. 
Under these conditions 

lim pt(t) = Pi 

Therefore 

E u {t) = E u 

= hjPi 

Equation (3.5) in this limiting condition becomes 

/ + = I E Pihj (3-17) 
i<ex- /ex + 



74 System Reliability Modelling and Evaluation 
Now as t -> 00 Equation (3.7) becomes 

0 = -Pi Z Z Mi/ 

iex- *ex- 

That is 

// = Z 

= P; Z \i 

= E, (3-18) 

Equation (3.18) describes the frequency balance of state 7 with the rest of 
the state space. This means that the frequency of encountering state / is equal 
to the frequency of encountering the rest of the state space from state /. The 
frequency of encountering a state may therefore be computed either by 
calculating the expected transition rate out of the state or into the state. Purely 
from a conceptual viewpoint, however, the frequency of encountering a state 
is the expected transition rate into the state. This definition holds in both the 
transient as well as the steady state domain. If the subset X + consists of more 
than one state, then by following the same reasoning as for (3.7) 

E ^r = - I pff> I x«+ 1 pm 1 x„ 

/ex 4 - ax iex + /ex- ;sx- tex* 

As t -> 00 

/+ = Z Pi Z ht 
y ex- iex 4 

= Z Pi Z V> 

ieX+ ;GX" 

= f- (3-19) 

In further treatment, the steady state frequency will be simply denoted as 
frequency. Equations (3.17) and (3.19) are of fundamental importance in 
determining the frequency of cumulative or individual states. 



The Frequency, the Cycle Time and the Mean Duration 

The probability of being in X* at any time in the interval (f, t + T) as 
t 00 is p+ - 2 Pi and the corresponding frequency is /+. Therefore, as 

ieX + 

t °°, the interval frequency 



Frequency and Associated Concepts 75 

F + (t, t + T) = The expected number of times X* is encountered in 
the interval (t, t + T). 

- Tf + 

Consequently 

T + — The expected time between the two encounters of X + , i.e., the 
mean cycle time 

- T/F + (t,t + T) 

= Mf + (3-20) 

Also 

d + = The mean duration of X + , i.e., the expected time of stay in X + 
in one cycle 

= T*p. (3.21) 

= />«//♦ ( 3 - 22 ) 



Equations (3.19), (3.20), and (3.22) are the backbone of the frequency and 
duration method of system reliability evaluation. The application of the concepts 
discussed is now illustrated with the help of the following example. 



Example: The state space diagram of two independent and identical compo- 
nents is shown below in Fig. 3.2. The failure and repair of each component is 
denoted by X and (i respectively. The state description is as follows: 

State 1 Both components up 

State 2 One component is up and the other is down 

State 3 Both components down 



2m 



Fig. 3.2. State transition diagram of two independent 
identical components. 



It is assumed that initially both the components are up. The calculation of 
various indices is now illustrated. 



76 System Reliability Modelling and Evaluation 



Time Specific Probabilities 

The state differential equations can be written using the concept that the 
expected transition rate into the state minus the expected transition rate out 
of the state equals the rate of change of the state probability, i.e. using 
Equation (3.7) 



STATE 1 
STATE 2 
STATE 3 



p\(t) = -2Xp l (t) + np 2 (t) 

Pi(t) = -(M + \)P2(t) + Pi(t)2\+ P 3(t)2iJi 

p 3 (t) = -2yp 3 (t)+p 2 (t)X 



The Laplace transform of the above equations, using the initial condition 
Pl (0) = 1, p 2 (0) = 0, and p 3 (0) = 0 

is as follows 

sp^s) = 1 -2Xp l (s) + [ip 2 (s) 

spi(s) = -(v + r>P2(s)+ Pl (s)2\ + p 3 (s)2ix 

sp 2 (s) = - 2w 3 (s) + p 2 (s)\ 



Solving the above equations 



P2CO = 



2A/i(s + 2ix) 



s + 2X s(s + X + + 2X 4- 2ju)(5 + 2X) 

2X(s + 2/i) 
s(s + X + ti)(s + 2X + 2ju) 

2X 2 

s(s + X + /^)(s4-2X + 2ju") 



Expanding into partial fractions 



Ms) 

Ps(s) 



(X+/i) 2 

2X 
(X + ti) 2 

X 2 
(X + ^) 2 



A* , 2X M | X* 
s s + X + y s + 2(X + m) 
X 



M + X-/i 



s s + X + ju s + 2(X+ju) 



s s + X + ju s + 2(X + ju) 



Frequency and Associated Concepts 77 

Converting the above 

(A + /Z) 
(A t H) 
(A + /i) 

It should be appreciated that since the two components are statistically 
independent, the above equations could have more easily been derived by 
finding the probabilities of each component and using the product rule of 
probabilities. 



Steady State Probabilities 

As t-> °°, the exponential terms disappear. The steady state values are 
,.2 



Pi = 



P2 = 



Pi 



(X + M) 2 
2Xm 

(X + aO 2 



(X + m) 2 



The remaining indices are calculated for state 3. The calculation for the other 
states is left to the reader as an exercise. 



Fractional Duration 

1 C? 



D 3 (0,T) = - f p 3 (t)dt 
1 Jo 



As T^-°° 

^3(0, T) 



X 2 
(X + M) 2 

P3 



7-2 — — + 



X + ju 2(X + //) 



78 System Reliability Modelling and Evaluation 
Interval frequency 

F 3 (0,D = \ T p2(t)\dt 

Jo 

2\ 



(X + ju) 2 



X + jLt 2(X + /i) 



Steady State frequency 

_ 2X 2 /i 
" (X + 

It can be seen that as T °°, the average interval frequency approaches the 
steady state frequency. 

F 3 (0,0 = 2X 2 jLt 

- h 



Mean Cycle Time to the Encounter of State 3. 
1 

73 = 7 

J3 

= (X + m) 2 
7\ 2 li 



Mean Duration of State 3. 

d3 = 
_ _1_ 
" 2ul 



Frequency Equilibrium in a System of Independent Components 

Equation (3.19) denotes the frequency equilibrium of the subset X + with the 
disjoint subset X~. A case ofspecial interest, from the point of view of the 
frequency equilibrium, is the system comprised of N independent binary 
components. The term 'binary components' is used here in the general sense of 



Frequency and Associated Concepts 79 



a process which exists in either of the two states, up and down, the duration in 
each state being assumed exponentially distributed with the mean values of m 
and r. The reciprocal of m is A which is the rate of transition from the up to the 
down state, and similarly ju, the reciprocal of r denotes the rate of transition 
from the down to the up state. In such a system it can be shown that in 
addition to frequency equilibrium of the state / with the disjoint subset 
containing the rest of the states, there is also a frequency equilibrium between 
any two individual states, i.e. 

The frequency encounter of state / from state /, i.e. the expected 
transitions per unit time from state / to state i. 

The frequency encounter of state / from state i. 

This relationship is of considerable value when dealing with independent binary 
unit systems. 




Proof: Let the state i comprise m components in the up state and k 
components in the down state. Whenever a component changes state, the 
system transits from state i to the state fitting the resulting description of 
component states. Let it be assumed that if the component '0' transits from 
the up to the down state, the system transits from state i to /. Then 



n 



— ■—- 11 — — — 11 — — 



X 0 



(3.23) 



where A is a set containing all the components in the up state, except the 
component '0' which is in the up state but is not contained in the set A. 
Similarly B is a set containing all the components in the down state. 
Also 



n 



n 



/i 0 + X 0 pGA JU p + X p qGB JUq + X 



Mo 



(3.24) 



From Equations (3.23) and (3.24) 



80 System Reliability Modelling and Evaluation 



Alternative Interpretation of Mean Cycle Time, Mean Duration and Mean 
Frequency 

The emphasis in the preceding sections is on the frequency as an expectation of 
the state encounter rate. The mean cycle time is then obtained as the reciprocal 
of the mean frequency. This section describes an alternative approach by first 
deriving the expressions for mean times and then deducing the frequency 
relationships. 

This treatment is intended for theoretically inclined readers and requires some 
acquaintance with renewal theory. This section may be omitted without any 
loss of continuity. 

The entire state space is again assumed to be partitioned into two disjoint 
subsets X + and X~. 
Let 

U = The random variable specifying the time of uninterpreted wandering 
of the system among the states j ieX + ^ , i.e. starting in ieX + , this is 
the time which the system spends in subset X + before once getting 
out. 

D ~ The random variable defining the time of uninterrupted wandering 
of the system among the states {ieX~ J. 

The sequence of random variables U and D defines an alternating renewal 
process. This is shown in Fig. 3.3. The random variable U + D specifies the 
cycle time, i.e. the time between two successive encounters of subset X or 
X~. The quantity of interest is the mean value T m of the random variable U + D, 
i.e. E(U+D) = E(U) + E(D) where 





U 2 D 2 




) - — O • 






l~« Ut + Dt ►! 






1 -«~ CYCLE Tl ME 







O RENEWAL OF X* q x 

# RENEWAL OF X~ 

Fig. 3.3 The alternating renewal process (U,D) defined on the state space X. 

E(U) and E(D) are respectively the mean values T u , T d of the random variables 
£/ and D. In long term or steady state analysis the interval of interest is (t, t + T), 
t^-°° + . If the origin is, however, fixed at f, i.e. the time is represented by x such 
that x = 0, at r, then the origin of the process is at t -> i.e. the process started 
remote from the time origin. Such a process is called an equilibrium alternating 
renewal process. 



Frequency and Associated Concepts 8 1 



Let 

S v {u) = The survival function of U 
= P{U>u} 

For the equilibrium renewal process, the residual life time of random 
variable U, i.e. the time^y from a random instant tell (i.e. under the condition 
that the system is in subset X + at the instant t) to the termination of f/has the 
probability density function 




The survival function S Ur (x) of the residual life time of U, U r therefore is 
S Ur (x) = ~ Su(u)du 



Differentiating this expression 

dS Ur (x ) 1 
dx T„ 



Since Su(0) = 1 



dS Ur (x) 




dx 


x = 0 



Now 



S Ur (x) = P{U r >x] 



(3.25) 



(3.26) 



where 



P* - P{Z ~i\i€. X + ], i.e., the conditional steady state probability that 
the system is in the z'th state given that it is in the subset X + 



and 



z = m 



rfa) = The probability that the system which begins to operate in 

state /GI + at some random instant of time in the equilibrium 
process will not once get out of X + during x. 



82 System Reliability Modelling and Evaluation 



It is obvious that 



(3.27) 



Since the transition rates X^ are assumed constant, the transitions from state i 
to state k are governed by random variables which have exponential probability 
density functions. The probability density and the survival functions of the kih 
process will be 



and 



S ik (x) = e'^ x 



When the probability density function is exponential, the instant of entry does 
not affect the probability density function, i.e. the probability density function 
of the residual life time of the random variable is the same as that of the random 
variable. Therefore, the probability that the system which begins to operate in 
the state ieX + at some random instant of time, will not once get out of state 
/ to state / 



Vu(x) = f f u (u) fl S tk (u)du 

Jx fe = i 



exp 



Z X f feW du 



fe=i 



Therefore 



= \j ( Z \k] exp 



n 

Z \k x 
k = i 



/ex- yeA-Jo 



= Z X v Z hk\ ex P P Z hkX 

j&X~ \k=l / I k=l 



+ Z hi exp ~Z f 



tt-Qj(y)) exp 



z 

fe 



where 0,(.y) is the probability distribution of the time at which the system gets 
out of X + given that the system entered state jeX + at.y = 0. 



Frequency and Associated Concepts 83 



Substituting into (3.26) 
Su r (x) = Z Pl 



dS Ur (x) 
dx 



Z hj ( Z M 1 exp - Z > 

+ x x,.,ex P - £ x.Jf (i-&oo) 

ft=£i 

x exp I; tfy 



Z ^v ex p - £ 



Z 



+ X l^ expj- Z M ((!-&(*)) exp { X X ( - te x 



fe=i 



~ 1 _ x tf I Z x ik* exp j- X X ife x 



J* 0 -a(y))ex P | t x ifc oj 4?j 



and 



Substituting into (3.25) and substituting the value of P* from (3.27) 



T u = Z A" Z Pi Z 
iex* \i^x* /ex- 



Similarly 



^ = Z Pi ( Z Pi Z 0 
iex~ \igx- j ex* y 



(3.28) 



(3.29) 



The expressions for T and 7^ have been derived by examining the 
distribution of the states constituting X + and X~ under stationary conditions. 
The behaviour of the alternating renewal process |f/,£>f is now examined in 
detail. Since x = 0 is at t -> °° + (see Fig. 3.3) 



84 System Reliability Modelling and Evaluation 



P u (x = 0) = The probability that the system is in X + at the 

observation origin, given the process started a long 
time ago 

= 2> 



Tu = Pvto (330) 



T„ + T, 



Similarly 



P D (x = 0) = P D (x) = £ p. = (3.31) 
Substituting (3.30) and (3.31) into (3.28) and (3.29) respectively 



T m — T u + T d = E P f X X i7 



I Pf Z (3.32) 



This expression for the cycle time is the same as that derived using the 
expectation concept. The frequency of encountering at jc is the same as 
the renewal density of the random variable U + D. The renewal density is 
defined as 



h(x) = 

Ax 

where 

N x , x +Ax ~ The random variable representing the number of renewals 
in (x, x + Ax) 

E Denotes the expectation 



Now, let 



n ij( x ) - The renewal density of subset / given that the system is in i 
at x - 0 



Frequency and Associated Concepts 85 



It can be shown that for a modified renewal process, i.e. a sequence of 
independent random variables in which all the random variables are identically 
distributed except the first one which has a different distribution, the expression 
for renewal density is 

M Ms) 

h(s) = 



1 

where 

1™) (m) 
/i( s )> f(s) and h (s) are the Laplace transforms oif x {x),f{x) and h (x) 
respectively. 

The quantities f x (x) and f(x) are the probability density functions of the 
first and the subsequent random variables respectively. 

1 . Determination of hx+x-(. s ) 

If the system is assumed to be in subset X + at 0, the first renewal of X~ or 
the encounter of occurs at t/ +D l where the subscript indicates the 
number determined from jc - 0 and not t = 0 (see Fig. 3.3) and the prime 
indicates that the distribution of the first U is different from the subsequent 
ones. The second renewal of X~ is at U 2 +D 2 and so on. 

Now 

f m - ^(*) 

1 u 

and 

f Dl M = fo(x) 
Therefore 

f l(s) = C 1 -fu(s))fp(s) 
T u .s 

and 

f(s) =7u(s)f D (s) 



Therefore 



W« = /7/ U f ( {fl, 0.33) 



86 System Reliability Modelling and Evaluation 
2 . Determination ofh x - x -(s) 

m = 

A d 

and 

m = Ms)f D (s) 

Therefore 

hx-x-(s) = zr~7\ — ? ( 3 - 34 ) 
T d s{\ -fu{s)f D (s)) 

Now 

f + (x) = h x -(x) = h x + x -(x).p{z(0) = iex + } 
+ h x - x -(x).P{Z(0) = iex~} 

i.e. 

7 + (s) = h x -(s) = h x + x -(s).P + +h x - x -(s)P- 

- 1 

~ s(T u + T d ) 

Converting 

Ux) = /+ = 1 



I Pi I Xtf 



I Pi I 



= = /. (3.35) 



since it can be proved similarly that 
/.(*) = /- = ~ 

^ m 

Also from (3.29) 

T u = The mean duration in X + 
= T m P + 
= P + /f + 



(3.36) 



Frequency and Associated Concepts 87 

It can, therefore, be seen that the expressions for frequency, mean cycle time 
and mean duration are the same as those derived using the expectation concept. 

The Relationship to Average Values 

It is quite well known that the arithmetic mean of a variable tends to its 
expected value as the number of trials becomes large. This section examines its 
application to the cycle time, the mean duration and the frequency indices. It 
can be seen from Fig. 3.3 that the cycle time between two successive encounters 
of X is characterized by the random variable T = U + D. Thus T = U +D is 
the cycle time to the first encounter of X + and T ( = t/. + £>• is the cycle time 
between the (i-l)th and the ;th encounter of Z + . The random variables T- are 
independently and identically distributed with mean T . Assume the random 
variable T i to be observed n times and define the random variable 

f = Tl + Tl + • • • + r " 
n 

Then for any constant e > 0 

limP{|f-r m |>e} = 0 

n 

This can be interpreted that, as the number of encounters of X + increases, 
the average cycle time approaches the mean cycle time with a probability of one. 
The mean cycle time found by Equation (3.32) is therefore the long run 
average interval between the two successive encounters of X + . Similarly it can 
be seen that T u and are the long run average residence periods of the system 
in X and X~. Since the frequency is the reciprocal of the mean cycle time, it 
can be appreciated that it is also a long term average. 

The Concept of Equivalent Transition Rate 

The concept of equivalent transition rate plays an important role in system 
reliability evaluation. Equation (3.5) can be written as 

WO = Z E p,-«x,7 

iex' j&x* 

where 

P. = I Pi(t) 

iGX" 

and 

Xx~x + (0 ~ The equivalent transition rate from subset X~ to X + . 



88 System Reliability Modelling and Evaluation 



Therefore 




(3.37) 



The most important application of this concept is in reducing the system 
state space. The states can be merged and the equivalent transition rate from 
the merged states found by the application of Equation (337). The full 
implications of the equivalent transfer rate and the limitations on its use are 
outlined in Chapter 5 while deriving conditions of mergeability. 

References 

1. D,R. Cox, Renewal Theory, Methuen's Monographs on Applied 
Probability and Statistics, Methuen, London (1962). 

2. D.R. Cox and H.D, Miller, The Theory of Stochastic Processes, Methuen, 
London (1965). 

3. R.A. Howard, Dynamic Probabilistic Systems, John Wiley (1971). 

4. C. Singh and R. Billinton, A Frequency and Duration Approach to Short 
Term Reliability Evaluation, IEEE Trans., PAS-92, No. 6 (Nov./Dec. 1973). 

5. C. Singh, 'Reliability Modelling and Evaluation in Electric Power Systems', 
Ph.D. thesis, University of Saskatchewan, Canada (1972). 

6. LA. Ushakov, Mean Time for Operation for a Semi-Markov Process, 
Engrg. Cybernetics, 4 (1969). 



CHAPTER 4 

System Reliability 



Introduction 

A system is a set of components arranged to accomplish a purpose or purposes 
under a given set of conditions. This chapter is concerned with determining the 
reliability characteristics of the system from the statistical information available 
on the failure and repair cycles of the components. Non-maintained systems are 
treated extensively in the literature and therefore the attention in this book is 
directed towards maintainable systems. In these systems, restorative action is 
initiated immediately after the failure and reliability modelling and evaluation 
is generally more complicated than in non-maintained cases. The theory and 
procedure is, however, quite general and can be equally applied to non- 
maintainable systems. 

Numerical values of reliability measures can be obtained either through 
simulation or by solving mathematical models. The simulation approach is 
discussed in Chapter 7. This chapter describes various mathematical approaches 
to system reliability evaluation. The essential theory has already been outlined 
in the previous two chapters. Different approaches to system representation 
and solution are described. The following preliminary analysis is basic to all the 
approaches. 



Definition and Description of the System and its Requirements 

The first step in developing a reliability model is to define and describe the system 
and its requirements. The system must be categorized into major subsystems 
and the function of each subsystem and the interface between them defined. It 
is helpful to prepare a functional diagram showing the interaction of the various 
subsystems. A statement of the function of each subsystem and component 
should also be prepared. This type of information is easy to obtain for existing 
systems or systems where the design is in the final stage. At the early stages of a 
project, a good description of the system may not exist and the reliability 
engineer may have to come up with a reasonable description of the system 
from the study reports, development plans and specifications. Discussions with 
other personnel engaged on the project are very helpful. The system can be 
usually broken down into convenient blocks and most often the systems are 
designed on that basis. In terms of developing a model for the reliability 



90 System Reliability Modelling and Evaluation 

calculations, it may sometimes be preferable to group the components from 
one natural subsystem into another to create independent subsystems as this 
facilitates the reliability calculations. 

Failure Modes and Effects Analysis 

The analysis has to start at the level at which the information is available or can 
be made available. The term component is used in a general sense as it may in 
itself be a subsystem. For example, in generating capacity reliability analysis, a 
generating unit is regarded as a component whereas it is in itself a complex of 
components. The failure modes of the components should be investigated. The 
different modes are recognized from the different effects which they may have 
on the system. Some components may have just one mode of failure whereas 
others may have several. A relay or a circuit breaker may, for example, fail in an 
open or closed position. These modes generally have different effects on the 
system. A diode may for example fail as open circuited or short circuited and 
these failures develop into different system contingencies. After the different 
modes have been identified, their effects on the system should be studied. To 
assure complete systematic coverage of the effects of various failure modes, it 
is useful to employ some type of recording forms. It is again stressed that the 
identification of the failure modes is based on the different effects which these 
modes may have on the system. After the first analysis has been performed it 
may turn out that some of the effects, different though they may be, can be 
regarded as identical from the point of view of reliability analysis. It may 
therefore be possible to reduce the modes of failure. It is advisable to keep the 
failure modes to a minimum both from the point of view of reducing the 
complexity of the problem as well as from the point of view of limited 
statistical data. It is also sometimes possible to regard the different failure modes 
as independent without introducing any significant error. In such a case the 
component can be represented by a number of independent binary components 
equal to the failure modes. 

The detailed form of above qualitative analysis is commonly referred to as 
FMEA (Failure Mode Effects Analysis). In certain programs which involve the 
limited production of complex and costly new hardware, it may not be 
possible to go much beyond this point and certain conclusions regarding design 
weaknesses may be derived from this type of analysis. Criticality analysis may 
be further added by ranking the critical failure mode effects according to 
the probability of occurrence. This is then called FMECA (Failure Mode, Effects 
and Criticality Analysis). This has been used extensively by NASA and other 
defence industries and is now finding its way into the analysis of commercial 
systems. 

The purpose of the above two steps is to deepen systematically the 
understanding of the system in terms of the various component failure 



System Reliability 91 



contingencies. In some cases the system may be simple or so well known that it 
may be possible to by-pass this analysis because it is intuitively known to the 
reliability engineer. Assuming that the above analysis has been performed or the 
system is intuitively well understood, the techniques for mathematically deriving 
the reliability measures can be broadly classified as: 

i. state space approach 

ii. decomposition using conditional probability approach 
hi. network method 

These techniques are now described in detail. 
State Space Approach 

A component may assume various states depending upon its failure and 
restorative modes. The system state describes the states of the components 
and the environment in which the system is operating. The set of all the 
possible states of the system is called the state space or event space. If the 
environment can exist in m states and the n components of the system are 
independent in each environment state, then the state space consists of 2 n+m 
states. The number of states is, however, modified because of the dependency 
restrictions. The state space approach involves the following steps: 

i. Enumerate all possible system states. 

ii. Determine the interstate transition rates. If a diagram is drawn showing the 
various states and the interstate transition rates, it is called the state 
transition diagram. 

iii. If the components are independent, the system state probabilities may be 
found from the component state probabilities by the product rule. In case 
of dependent failures, the system equations are formed and solved for the 
state probabilities. 

iv. The states are then grouped into subsets depending upon the requirements 
of the analysis. In most cases, measures are required only for success or 
failure but in some cases the indices may be computed for a graded mode of 
operation. After the grouping has been done, the subset probabilities, 
frequencies, cycle time and mean duration can be computed by the 
application of the formulae given in Chapter 3. 

This approach is conceptually general and flexible and makes it possible to 
take into account various dependent failures. In large systems, especially involving 
dependent failures, it may be difficult to apply this technique. The methods for 
overcoming these difficulties are discussed in the next chapter. This approach 
is illustrated with the help of the following simple example. 



92 System Reliability Modelling and Evaluation 



System Description 

The system consists of line 1 and line 2 supplying electric power to bus C from 
buses A and B. The supply at points A and B is considered to be perfectly 
reliable. Line 1 supplies 75% of the total power and line 2 supplies the 
remaining 25%. The lines can exist either in the 'up' or the 'down' state and the 
repair facilities are sufficient to perform repair on both the lines at the same 
time. The system is considered failed when it is supplying less than 75% of the 
power. 

State Transition Diagram 

In evolving the state transition diagram it is usually convenient to start from the 
state in which all the components are in a working condition. This condition is 
represented by state 1 in Fig. 4.1. The states to which the system can transit from 
state 1 are determined by the component transition modes and either of the 
lines can fail giving state 2 or 3. If the up times of both the lines are exponentially 
distributed with mean up times T x and T 2 , the transition rates from state 1 to 
states 2 or 3 are constant and given by 




This is shown in step 1 . 

Consider states 2 and 3 where one component is failed and the other is 
working. The system can change state either because of the failure of the working 
component or because of the repair of the failed component. If component 1 
were to fail in state 2, the resulting state description would be (ID, 2D) as shown 
by state 4 in step 2 in Fig. 4.1 . Since the failure rate X is constant for 
component number 1, it is unaffected by the time of residence in state 1 or 
state 2. If, however, the distribution were non-exponential, the transition rate 
would be dependent on times of residence and such a simple representation is 
not possible. This will be treated in detail in Chapter 6. Now considering state 3, 
if component 2 fails the system will transit to state 4 already generated but if 
component 1 is repaired, the system goes back to state 1. This is shown in 
step 2. 

In state 4, both the components are down. Since both the components 
can be repaired independently, the resulting diagram is shown in step 3. The 
state transition diagram for this system is simple and could have been drawn in 
a straight-forward manner without going through the above procedure. The 
purpose of the above discussion is, however, to illustrate the concept of 
evolving the state transition diagram by examining the possible transition 
modes of each component in a particular system state. It should be realized that 



System Reliability 93 

it is not possible to draw a state transition diagram for relatively large systems. 
The above procedure is, however, readily programmable on the computer. The 
states in the computer program are represented by binary numbers, usually 0 
standing for working components and 1 for failed ones. Special codes are 
employed for representing other modes (i.e. stand-by). The computer program 
usually evolves the state transition matrix directly and further manipulations 
are done using this matrix. 



HI 

1D 
2U 



\ 2 



STEP # 2 



Ti 

w 

2U 



M2 



A 2 



STEP # 3 



x 2 



[± 

10 

2D 




li. 

\U 
2D 







Fig. 4.1 Evolution of the state transition diagram. 



State Probabilities 

Both the lines are independent and the state probabilities can therefore be 
found by the simple product rule. It should be noted from Fig. 4.1 that when 
all the components are independent, each component can exert all its 
transition modes without any restriction from the other components or the 
environment. If in any system state, one or more of the transition modes of a 
component are suppressed, the components are no longer independent. Denoting 
the probabilities of the rth component being in the up or down state by p- (t) 
and p i( ft\ these values for the initial condition P iu (0) = 1 , P ic 0) ~ 0 can be 
written from the results of the first chapter 



94 System Reliability Modelling and Evaluation 
X,- + ju,- \ + ^ 



Ptd(t) 



h --(vm,-)* 



h + M« h + Mi 



The steady state values are 
Mi 



Pu 



Pid 



\ + Mi 



The state probabilities are 
Pi(t) = Pi u (t)p 2u (t) 

Pl{t) = PXu(t)Pl d {t) 
MO = Pld(t)P2u(0 

and 

MO = Pid(0P2d(0 
Reliability Measures 

Since the system is considered failed if less than 75% of the total electric power 
is transferred, the subset X + of failed states is 

X + = {3,4} 
and the disjoint subset X~is 

X- = {1,2} 
The various measures can now be calculated 

Time specific domain 

The probability of the system being in the failed state at time t 
= The time specific availability of X + — A + {t) 

= I Pi(0 



System Reliability 95 



= MO + MO 

= Pld(t)(P2u(t) + P2dit)) 
= Pld(t) 

= -±1 ^—e-CX.+M,)* ( 4il) 

Xi + Mi Xi +Mi 



Fractional duration 
= Z) + (0, 70 



1 CT 

7 Jo 



(Xi 
Xi 



T + 



Xi Xj 



Xi +Mi 



1 + 



nXi 



(e -(^ 1+ M 1 )T_ 1) 



(4.2) 



Interval frequency 

= f + (o, r) 

= f T (Pi(0+Pa(0)Xi^ 

Jo 

= f T Pi u (OXi^ 

Jo 

Xi +mWo 



Xi 
Xi + Mi 



M1T + 



Xi +Mi 



(1-e 



(4.3) 



Steady State Analysis 

The steady state measures can be obtained either by finding the limiting values 
of the time specific results or else by the direct application of the steady state 
formula. The latter approach is usually easier. 



96 System Reliability Modelling and Evaluation 
System unavailability 

= The availability of X + 

= Pia 

= Xl 
Xi +ni 

It should be noted that this result could also have been obtained either from 
Equation (4.1) by letting t -* 00 or from Equation (4.2) as 00 . The steady 
state unavailability can be interpreted, as mentioned before, either as the 
probability of being in the failed state at a point very remote from time of 
origin or as the limiting proportion of interval (0,7} spent in the failed state, 
when Tis very large. Frequency of encountering the failed state 

= (Pi +P2)Ai 
= Plu^l 

= 

Mi + Xi 
= PidVi 

Mi +Xi 

This illustrates the frequency balance between the failed and working system 
state. The frequency could also be found from Equation (4.3) 

by letting T-*°°, in F + (0, T)/T 

Mean cycle Time 

= 1 /Frequency 

= Mi + X i 
MiA x 

Mean duration of the down state 

= Mean cycle time jc unavailability 
_ 1 
Mi 

The following section examines some basic configurations which occur quite 
commonly as systems or subsystems. 



System Reliability 97 



Series System 

Components are considered to be in series when the failure of any one of the 
components causes system failure. This is shown in Fig. 4.2 where the failure of 
a component is equivalent to the removal of the corresponding block. It should 
be noted that the actual configuration of the components may or may not be 
in series, it is the effect of the failure of each component that is important. There 
are now two cases, firstly when all the components are independent and secondly 

Input 1 1 j 1 2 j 1 3 j j n j Output 



Fig. 4.2 Reliability diagram of n components in series 



when after the system failure no further failure is possible so that there can be 
only one component in the failed state at any time. These two cases are 
considered separately. 



Independent Case 

Since the components are independent, the state probabilities can be derived 
from the component probabilities by the product rule. The state space will 
consist of n+1 subsets of states having 0,1 components in the failed state. 
The failure of even a single component causes system failure and therefore 
all the states except the one in which all the components are working represent 
system failure. It is therefore necessary to determine the measures for subset X H 
such that 

X~ = {state where all components are up } 
X + = {all states except the one defined above} 
A + (t) = l-A_(t) 

= 1 -Plu(t)P2u(t) • • -Pnuit) 



It can be seen that the above expression will involve exponential terms and the 
fractional duration and the interval frequency of the system failed state can be 
readily found by the application of appropriate formulae. In maintainable 
systems the steady state values are usually of primary importance and can be 
obtained from the following explicit formulae for this case. 



98 System Reliability Modelling and Evaluation 
System Unavailability 

= 1 -PiuPiu ■ • -Pnu 

Ml^2 ■■ - Pn 

(X 1 + MiX^ + Ma)---(^n + Mn) 

The frequency of encountering the down state 

_ j"lM2 • • -Mn(Xl + X 2 + . . . + X„) 
" (Xi+MlXXa+M2)...(Xn+Mn) 

Mean cycle time to encounter the system down state 
= (X 1 + jU 1 XX 2 + M 2 ).-.(X» + ^) 

/l!M 2 • • .Mn(Xl + X 2 + ■ • - + X„) 

Mean down time 

- (M.C.T.)(Unavailability) 

^ (X 1 +Mi)(X 2 + ^ 2 )...(X n + j Un) 1 

MlM2 ... Mn(Xi + X 2 + • • • + K) Ai + X 2 + . . . + X n 



It should be noted that in the above expression the first term is the mean cycle 
time and the second term can be easily seen to be the mean up time. The 
mean down time is therefore represented as the difference of mean cycle time 
and the mean up time. For a two unit series system the expression for mean 
down time reduces to 



(Xi +X 2 ) 



(Xi +Mi)(X 2 +y 2 ) 
P1P2 



-1 



Xiju 2 + X 2 jUi + XiX 2 
(Xi + X 2 )miM2 

Xi^i + X 2 r 2 + XiX 2 r 1 r 2 
Xi + X 2 



(4.4) 



where r and r are the mean down times for components 1 and 2. 



System Reliability 99 



Dependent Failure 

In many series systems it is safe to assume that once the system has failed, 
further failures will not occur. In such a case the state transition diagram is 
shown in Fig. 4.3. State 0 corresponds to the working state when all components 
are up. State i,i=\,...,n, corresponds to the ith component down. Once the 
system enters any of these states it can return only to the working state 0 
because further failures are impossible. The transient solution can be obtained 
by solving the state differential equations. The steady state solution proceeds in 
the following manner. The frequency balance equations can be written as 
follows. 




Fig. 4.3 A series system with dependent failure. 



Po £ x,- = £ pm 

PiPi = Po\ 
From Equation (4.6) 



(4.5) 
(4.6) 



Pi - —Po 
Pi 



Substituting these values in equation 

n 

Po + X Pi = 1 
i=l 



Po 



where 



and 



+ z ^ 

«=i Mi 



ftZ 



1 00 System Reliability Modelling and Evaluation 



System unavailability 

n 

- Z Pi 



1 f \ 



Frequency of encountering the down state 

n 

= Po E X i 
i=l 

1 « 

Z 1=1 

Mean cycle time 

Mean Down Time 

= (M.C.T.)(Unavailability) 

i=l Mi/ i=l 



= Z Vi/Z V (4.7) 



Referring to Equation (4.4), in high reliability component systems, the term 
Xi \ 2 r\r 2 can be neglected and therefore in such cases the mean down time is 
approximately the same in both the cases. 



Series System with Spare 

Consider the special case of a series system with a spare. The example chosen 
consists of a bank of three single phase transformers. As soon as a transformer 
fails, the entire bank is shut down until the failed unit has been replaced by the 
spare. The repair facilities are assumed to be unrestricted, i.e. as soon as the 
transformer fails, the repair is started irrespective of the fact that another unit 
may also be undergoing repair. The down time consists of two phases, the 



System Reliability 101 



repair time and the change out or the re-installation period. The following 
symbols are used: 

X = failure rate of a single phase transformer 

M = repair rate 

7 - installation rate 

The up time and the down times are assumed to be exponentially distributed. 
Reliability modelling of this system when the down times are not exponentially 
distributed is discussed in Chapter 6. The state transition diagram is shown in 
Fig. 4.4. The subsets X + , X~ representing the system failed and working states 
can be defined as follows : 

X + = { 3,4,5 f 

x- v ji,2j. 



State 2 
bank UP 
spare 0 



State 1 
bank UP 
spare 1 



State 5 
bank DN 
spare 2 



State 4 
bank DN 
spare 0 



2m 



State 3 
bank DN 
spare 1 



Fig. 4.4 The state transition diagram of a bank of three single 
phase transformers with one spare 



The state frequency balance equations can be written as 

3ty->i = MPs +JPs 
(3X + ii)p 2 = jp 3 
(7 + ju)P3 = 3Xpi + 2ju/? 4 

IPs = MPs 

Any four out of the above five equations together with the following 
equation can be used to obtain the steady state availabilities. 

5 



1 02 System Reliability Modelling and Evaluation 

The various probabilities obtained by solving these equations are: 



2 T V +27iU 2 (3X + M) 
Z 



Pi 
where 

Z = 9XV +2ju[(3X + m)(3X + t)(7 + ^)3 



and 



6Xt 2 M 6XjU7(3X + u) 
P2 =—^ Pa - ~ z 

9XV 6Xm 2 (3X + ju) 

P4 = — Ps = ~ z 

9X V + 6X/i T (3X + ju) + 6Xju 2 (3X + ju) 
Pdn = P3 +P4+P5 = — 

27 V + 2-yju 2 (3X + ft) + 6Xt 2 m 
Pup - Pi +P2 ~ ~ 



The frequency of encountering the subset X + of the failed states can be 
found as: 



fx* ~ X Pm^mk ~ Z Pk^km 



It is more convenient here to use the first relationship 

fup = /djv = 3 MPi +Pi) 
= 3Xp[/p 

3X(2 7 2 M 2 + 2tju 2 (3X + ju) + 6X7 2 ju) 



The mean cycle time 

Z 



3X(2 7 2 M 2 + 2 7 Ai 2 (3X + ju) + 6X7 2 jLt) 

The mean down time 

= (M.C.T.) x (Unavailability) 

9X 2 7 2 + 6XjLt T (3X + ju) + 6Xju 2 (3X + 7) 
3X(2 7 2 ju 2 + 27M 2 (3X + m) + 6X7 2 m) 



System Reliability 103 



Parallel Systems 

When there are a number of components each one of them partially or 
completely serving the purpose, these components are said to be in parallel. 
When the components are simultaneously performing the same function so 
that the system will be fully available if at least one equipment is operating, it 
is said to be a parallel redundant configuration. In the example of two 
transmission lines of 25% and 75% of the total capacity requirements, this is 
a parallel but not a redundant arrangement. The following section deals with 
parallel redundant configurations. If the failure and repair rates of the equipment 
can be regarded as independent of the environment, the components canbe usually 
regarded as independent. The probability of the system being in the down state 
at time t, i.e. the unavailability, can be found by the product rule 

U(t) - Pld (t)p 2d (t)..,p nd (t) 

This product will be a function of exponential terms which are easily 
integrable. The fractional duration or the average down time in the interval 
(0,t) can be therefore easily determined. For determining the interval frequency, 
the states in X~ from which the system can transit to X + in a single transition 
have to be identified. These states are called the boundary states and it is these 
states which contribute to interval frequency, the transitions from other states 
of X~ remaining within the subset. In the present example, these states are 
obviously those in which only one component is in the working state. The 
number of such states is obviously n. The time specific frequency of encountering 
the down state is 

WO = f PiuiOhfl p jd (t) 

and 

f DN (0, t) = f * f DN (x)dx 
Jo 

The steady state results which are of main interest can be, however, more readily 
obtained in explicit form 

U = Unavailability = f[ — ^ — 
£=1 X,-+/!i 

It has been explained previously that in the steady state condition, there is a 
frequency equilibrium between X~ and X + . The frequency can therefore be 
found for either of these two subsets. In practice, the one which is easiest to 
compute. In this case it is obvious that the easiest way is to find the frequency 



1 04 System Reliability Modelling and Evaluation 

of encountering the up state since there is only one state in the down state. 
Therefore 

/ n \. \ n 

}'dn = fup = n , z Mi 

\i=i Xi+HiJ & 

The same result is derived for the sake of completeness as encounters from X~ 
to X + . 

n n 

fDN = Z Piuh U Pjd 
i=l 3=1 



= E Mi n pjd 
i= i j= i 



n \ n 

IT Pjd] Z Hi 
i=l / i=l 



= in r^-l E * 



which is the same as derived above. The mean cycle time to encounter to the 
down state is the reciprocal of the frequency. The mean down time 

= U/f nN 

- _L„ 

n 



The m/n Parallel System 

In some systems of identical parallel components, m out of n components may 
be required for successful operation. These are termed mjn parallel systems and 
can be represented as shown in Fig. 4.5. 



System Reliability 105 




Output 



Fig. 4.5 The m/n Representation 



If the components are independent, the reliability measures can be easily 
derived. Assuming identical failure and repair rates A and £t 



U = Unavailability 



m-1 



X + m 



X + m 



m-2 





m-2 


X 


n-m +2 


X + jU. 




X + IJL 





X + m 



(X + /i) n r=0 V 



z : w 



For calculating the frequency of encountering the down state, the boundary 
states in X~ are those in which m units are working and n-m are failed. Any 
further component failure results in system failure. The frequency is therefore 



!n\ji m \ n - m ^ 
Jdn = I I . m\ 



m (X + juf 



The mean cycle time can be found as a reciprocal of f DN and the mean down 
time 



106 System Reliability Modelling and Evaluation 
Decomposition Using the Conditional Probability Approach 

This method consists of decomposing a complex system into simple subsystems 
by the successive application of the conditional probability theorem. The idea is 
to calculate the reliability measures of the simpler subsystems and to combine 
these results to obtain the values for the system. The selection of the component 
or subsystem which is the key component or subsystem is therefore important. 
If this is not judiciously chosen, the final results will still be the same but the 
computation could be far more difficult. This method can be used to simplify 
both the state space as well as the network approach. Denoting the key 
component by X and representing the probability of its being up or down 
by P(X) and P{X) respectively 

P s = Probability of system success 

= /^System success \X)JP(X) + P(System success \X)P(X) (4.8) 

and 

= Probability of system failure 

= ^System failure \X)P(X) + ^System failure \X)P(X) 

The formula for calculating the frequency of encountering the down state (or 
any other state or subset of states) has been derived by the authors in the 
analysis of interconnected electric power systems and is explained below. The 
frequency of encountering the failed state of the system consists of the 
following components: 

i. the frequency of encountering the failed state, the component X being up, 
denoted by f l 

ii. the frequency of encountering the failed state, the component X being 
down, denoted by f 2 

iii. the frequency of encountering the failed state as a result of the failure of 
component X, denoted by / 

The frequency being the expectation of the state encounter rate 

fx- = The frequency of encountering the system failure 

= /(System failure \X)P{X) + /(System failure \X)P(X) + f 3 

The expression for/ 3 is now derived. Let the state space of the system be 
denoted by Y Q and Y l given that X is up or down respectively. Let the subsets 
of the failed and working states be denoted by superscripts + and - respectively. 
Y Q ~ thus denotes the set of working states given X is up and Y l + denotes the set 
of failed states given that X is down. States in Y Q and Y 1 having the same 
configuration of components, excluding X, are regarded as identical. It is assumed 
that the system cannot transit from the failed state to a working state by the 
failure of X. Therefore a state which is a member of Y n + cannot be a member of 



System Reliability 107 

Y t ~. The intersection of subsets Y Q ~ and Y + represents the states in which the 
system is working if X is up but failed if X is down. Denoting these states by set 
S 

h = I P(Siztei\X)P(X)\ x 

where 

\ x = The failure rate of X. 

If the component X is independent of the rest of the system, then the sets 
Y 0 and Y x are equal and 

i^Statei | JT) = P(State / [J) 
Under this condition 

£ instate i IX) = E P(State i.|X) - £ ^(State z|X) 

= ^(System failure |X) - ^(System failure |X) 

The formula for system failure frequency ,,therefore, becomes 

f f - /(System failure | X) P(X) + /(System failure j X) P(X) 

+ (P(System failure |X) -^(System failure \X))P(X)\ X (4.9) 

The application of this approach in simplifying reliability block diagrams is 
shown later in the chapter; the application in a state space approach is shown in 
the following example, which is rather an oversimplification of interconnected 
power systems. 



Example 4.1 : The electric power to an area is supplied from Station A where 
two identical generating units 1 and 2 are installed. A generating unit 3 is 
installed at a remote place and is supplying power to the same area through a 
transmission line. The failure and repair rates of the generating units are assumed 
to be X and // respectively. 

The failure and repair rate of the transmission line are A.^ and ju^. The system 
is classified as failed when no supply is available at all. . 



1 08 System Reliability Modelling and Evaluation 

O 

■ ©- 

& 



Transmission line 



>Supply 



Generating station B 



Fig. 4,6 Functional diagram of the generation scheme 



Assuming the transmission line in the up state, the three units can be assumed 
as parallel, therefore 



^(System failure | TL up) = 
/ (System failure | TL up) — 




Assuming the transmission line in the down state, the system is composed of 
only two parallel units, and therefore 




System Reliability 109 



The same result can be obtained by drawing the state space diagram and 
making the calculations. This exercise is left to the reader for verification. 

Network Approach 

Hie state space approach is general and flexible but it becomes cumbersome 
when the number of states becomes large. The network approach when 
applicable usually provides a shorter route to solution. The network approach is 
usually not suitable when dependent failures or repairs are involved. Such 
failures occur in standby systems, interactive systems or systems whose failure 
and repair rates respond to a fluctuating common environment. It is not 
necessary to assume the event independence in this approach, but dependent 
events can greatly increase the algebra of computations and sometimes the 
solution may become impossible. At this point it is necessary to recognize the 
difference between two types of block diagrams. 

Physical or Block Schematic Diagram 

This diagram describes the actual connections between the components. Each 
block is a component and the diagram shows the manner in which they are 
actually connected. 

Logic Diagram or Reliability Block Diagram 

This diagram describes logical connections between components. Each block 
is a component which is removed when the component fails and replaced when 
it is repaired. The connections between the blocks describe the success or 
failure of the system as a function of the states of the component. 

In the block schematic diagram, a component is not repeated because it 
represents the physical reality but in the reliability block diagram the block may 
be repeated. It is generally easy to construct the physical diagram as it follows 
from the physical layout. The reliability block diagram is usually difficult to 
prepare and in some cases a unique diagram may not exist. In the cases of 
information or power flow systems, it may be easy to construct a reliability 
block diagram. In many cases it may be difficult to do so. It requires a thorough 
understanding of the system and it is advisable to perform FMEA before 
constructing this block diagram. In the methods described below, the following 
assumptions are utilized. 

1. the system is composed of independent components or subsystems. When the 
component reliabilities are high, this assumption enables approximate results 
to be obtained for non-independent units. Approximate formulae are also 
available for the reliability of a parallel network when the components 
respond to a two state fluctuating environment 



1 1 0 System Reliability Modelling and Evaluation 



2. each component or subsystem can be represented by a two state device, and 
the system success or failure can be expressed in terms of these two state 
devices 

3. when all the components are working, the system is successful and when 
all the components are failed, the system is failed 

4. when a group of components is working and the system is successful, the 
restoration of a failed component will not cause system failure 

5. when a group of components is failed and the system is failed, the failure 
of any additional component will not restore the system to a successful 
state. 

As the components are assumed to be independent it is not necessary to assume 
any particular distribution form for the up and down times in order to obtain 
steady state results. This will be clarified further in Chapter 6. It is only 
necessary to know the mean up and down times. The reciprocals of the mean 
up times and mean down times are failure and repair rates respectively. Assuming 
that the reliability block diagram exists, there are two main methods, the 
reliability diagram reduction method and a method based on manipulating the 
cut sets or tie sets. Both of these methods will now be described in detail. 



Network Reduction Procedure 

This method proceeds by the manipulation of the basic network structures: 

i. series structures 

ii. parallel structures 

iii. mjn structures when the n blocks originate from a common nod§ 

The method sequentially reduces the simple structures to equivalent units until 
the whole network reduces to a single unit. The necessity of assuming subsystem 
independence will be made clear in the next chapter while discussing the 
validity of equivalent transition rates in the context of subsystem reduction. 
Assuming that no'subsystem is represented by more than one block, the 
procedure is as follows: 

1. replace all series blocks by an equivalent block. In this case the following 
measures of the equivalent block can be easily obtained 

Availability = f| p iu 

Vi 

Failure rate = £ \. 

Vi 



System Reliability 111 



2. in the resultant diagram replace the parallel and mjn substructures by 
equivalent blocks. In the parallel block diagrams it is convenient to determine 
the following measures of the equivalent block 

Unavailability = Yl Pid 



Repair rate 



= 1 

Vi 



The above steps are repeated until the whole network reduces to an equivalent 
block. If at any stage the network does not reduce any further, then the 
decomposition approach may be employed to generate simpler networks. In 
applying the decomposition approach to networks, the condition that the key 
component is good is equivalent to replacing this component by a short circuit 
and the condition that this component is down is equivalent to replacing it by 
an open circuit. This approach is illustrated by application to the system shown 
in Fig. 4.6. The reliability block diagram follows directly from the system 
topology and is shown in Fig. 4.7. 



Input o- 



-o Output 



Fig. 4.7 Reliability block diagram of the system shown in Fig. 4.6 



In this diagram, blocks 3 and X are in series. The equivalent block 3X has the 
following measures 

Plxd = 1 -PluPxu 
- ^3 + X* 



Components 1 , 2 and 3X are now in parallel, therefore 

Unavailability = p id p 2 dP3xd = PidPid ~ P\dPidP*uP* 



1 1 2 System Reliability Modelling and Evaluation 

= PldP2d ~ PxdPtdil ~Pid)Pxu 
'~ P\dP2d(}-Pxu)+P\dP2dPzdPx 
= P\dPldPxd+ PldPldPzdPxu 



X + H X x + ll x \X + M/ X* + P x 



P> 



This result is the same as obtained in Example 4.1. The repair rate of the 
block 3X can be derived from 



X 3 * + P-3> 



- U 3x ■ = . 1 -^PsuPxu 



M3X 



1 ~PZuP> 



= ^xPluPxuKl -PluPxu) 



The frequency of system failure 

= System Unavailability £ ju z - 

Vf 

= PldPldi} ~P3uPxu)(Pl + M2 +^3x) 



2p 



X + ju/ X x + fd x \\ + jul X x + ju x 



+ PldP2d(X 3 + K)P3uP. 

= 2m 



X \ X* _/ X \ /i 3 



X + pj X x + ix x \X + v] X x + m x 

X V2 



3 3^ / X 



X + ju I X x + \X + m/ K + 



2juX, 



which is the same as derived in Example 4.1. It should be noted that though the 
deriving of explicit expressions could be a formidable task, the numerical results 
even for large systems can be obtained easily using a calculator or a computer. 



System Reliability 1 1 3 



Example 4.2: The reliability block diagram of a system is shown in Fig. 4. 
The mean up times and down times of the various blocks are: 



MUT X = MUT 2 
MDT. = MDT 



MUT 3 = MUT 4 
MDT, = MDT 

3 4 



MUT. 



fcYr 



MDT 5 = 20 Hr 



Assuming that all components are statistically independent, calculate the 
reliability measures for continuity between s and../, 



Fig. 4.8 Reliability block diagram for Example 4.2 



It is obvious that no simple series or parallel paths exist in this reliability 
structure. Applying the principle of decomposition, the block diagram can be 
split into simple series and parallel configurations as shown in Fig. 4.9 in the 
form of a tree. Referring to Formulae (4.8) and (4.9), the values required are 
the probability and frequency of system failure given 5 is good and then given 5 
is bad. The system measures can then be calculated in terms of these values. 



The failure and repair rates of the components are: 
Xi — X2 — X3 — X4 = X5 — X 



and 



Pi = M2 = P3 = M4 = Ms = V 



The probability of a component being down 



1 1 4 System Reliability Modelling and Evaluation 





Given 5 Good 

Pl2d = Pd 

P\2u = 1 ~P\2d = 1 ~Pd 
fl2 = Pl2d(Hl 

= iwl 

The equivalent failure rate of component 12 is given by (3.37) 

^12 — fn/Pnu 

= 2hp5/(1 -pj) 

and the equivalent repair rate 

Ml2 = fl2/Pl2d = 2ju 



System Reliability 115 

The values for the equivalent block 34 are the same as for 12 because they 
consist of identical components. The equivalent blocks 12 and 34 are in series 
and therefore 

^(System failure j 5 Good) = pi 23 4d 

= 1 —pviuPiAu 

= I-(I-P^) 2 = 2p 2 d - Pd 

/(System failure [5 Good) = pi234 U (Ai2 + ^34) 

Given 5 Bad 

Pi3 U '= PiuPiu = (l~Pd) 2 

Pl3d = 1 ~ Pldu 

= 2Pd -Pa 

fl3 = P 13 «(*l + >*) 

= (l-p d ) 2 2X 

13d 

= 2X(1 -p d ) 2 

2Pd~Pd 

P(System failure 15 Bad) 

~ Pl324d = Pl3dP24d 

= (2Pd-Pd) 2 

/(System failure 1 5 Bad) 

= Pl324d(Ml3 +M24) 

= 4np d (l-Pd)(2Pd-Pd) 
These results can now be combined using'formulae (4.8) and (4.9). 
P f = P(System failure 1 5 Good)J D (5 good) 
+ P(System failure 1 5 Bad) P(5 Bad) 
- (2rf "Pd) + (2Pd ""Pd) 2 Pd 



116 System Reliability Modelling and Evaluation 
= 2p 2 d +2p 3 d -5p d + 2p d 

and 

/ f = /(System failure 15 Good) i>(5 Good) 
+ /(System failure 1 5 Bad)^ Bad) 
+ (^(System failure 1 5 Bad) 
~P(System failure 1 5 Good))P(5 Good). A 5 
= 4mp3(1 -p|)(l -p d ) + 4vip 2 d (2p d -p 2 d )(l~p d ) 

+ ((2p d -p 2 d ) 2 -2p 2 d +p« d )iip d 
- '/z(4p3+6p3— 20rf + 10p|) 
Mean cycle time = 1 /// 

and 

Mean Down Time = £y//f. 
Cut Set or Tie Set Methods 

The network reduction approach is quite useful when the block diagram consists 
essentially of series and parallel structures. When the reliability block diagram 
is complex, decomposition into simple series and parallel paths may not be 
easy. The process could be quite difficult to program because it would require 
a lot of scanning. The approach using cut sets or tie sets is especially useful for 
computer applications. The following definitions are useful in appreciating this 
approach. 

Simple Path 

If in going from node x to y no node is traversed more than once, the path is 
simple. For example, in Fig. 4.8, 1—5-4 is a simple path. 

Connected Subnetwork 

A subnetwork is said to be connected if there exists a simple path between all 
pair of nodes. 

CutSet 

This is a set of components whose failure alone will cause system failure. A 
minimal cut has no proper subset of components whose failure alone will cause 
system failure. 



System Reliability 1 1 7 



The minimal cuts for the reliability block diagram shown in Fig. 4.8 are 
listed in Table 4.1. 

Table 4.1 Minimal cuts between input and output nodes in Fig. 4.8 

Minimal cut set Components in the set 

C { 1,2 

C 2 3, 4 

C 3 1,4,5 

C 4 2,3,6 



Path or Tie Set 

This is a set of components whose functioning alone will guarantee system 
success. A minimal path or tie set has no proper subset of components whose 
functioning alone would ensure system success. 

In this approach it is possible to proceed either through cut set manipulation 
or tie set manipulation. The choice is usually dictated by their relative numbers. 
Both methods are explained. 

Tie Set Manipulation 

In the minimal path all the blocks constituting it are in series. The failure of any 
one of these blocks would render that path ineffective. The minimal paths 
themselves are, however, in parallel as the system will be successful so long as 
there is even one path available between the input and output of the reliability 
block diagram. Denoting the path available and unavailable by T and T 
respectively 

P s = P(T l UT 2 UT 3 U...UT m ) 

= [P(TO+P(T 2 ) + P(T 3 ) + ...+P(T m )]^(^ terms 

■-{PiTinT 2 )+P(Tin'T 3 y+.^ + P(T t nT f )] <- Q terms 

+ [P(Ti n t 2 n Ti) n t 2 n r 4 ) + . . . 

+ P(Ti n 7} DT k )] <~ n terms 
i*)*h \3/ 



1 1 8 System Reliability Modelling and Evaluation 



+ (- l)"" 1 [P(T X n T 2 n . . . n r n ] <- Q terms (4.10) 

The total number of terms in this expression is 2 n -l where « is the number 
of tie sets. It can be seen from Equation (4.10) that the independence of 
components need not be postulated. All that is needed is the evaluation of all 
the terms of the expansion (4.10). In the case of dependent failures, however, 
the entire state transition diagram may have to be drawn to evaluate the above 
terms. This method is therefore of importance for independent components. 
As the number of tie sets, however, increases, the expansion of all the terms 
becomes a formidable task. In such cases useful approximate formulae can be 
obtained by Boole's inequality. 

P(T t u T 2 U . . . U T n ) <P(T l ) + P(T 2 ) + P(T 3 ) + . . . + P(T n ) 

Therefore if only the first row in the expansion (4.10) is calculated, the result 
will be an upper bound approximation. This upper bound becomes a good 
approximation when the component reliabilities are low. As the components are 
assumed independent 

p wd = n pju 

JET; 

When the number of tie sets is relatively small so that explicit relationships 
can be derived, the following procedure can be used to obtain exact values: 

1 . Calculate the tie set availabilities 
P{Td = n Pju 

2. The formula for system availability assuming the paths are independent is 
^. = 1-11(1-^)) 

vi 

3. Independence of paths has been assumed in the above expression. It will 
contain some terms containing ^. These terms are introduced due to the path 
independence assumption in calculating the 2nd to nth row in expansion (4.10). 
Therefore, the exact result for P 5 may be obtained by replaingp^ by p^. This 
procedure is, however, possible only when the number of tie sets is small and 
an explicit expression can be easily derived. When only the numerical results are 
to be obtained, either the expansion (4.10) may be evaluated or the upper 
bound approximation may be calculated. 



System Reliability 1 1 9 



Cut Set Manipulation 

In the minimal cut the blocks are in parallel as all of them must fail, to produce 
a cut. The minimal cuts themselves are, however, in series as even a single cut 
ensures failure. Denoting the failure of the zth cut set by Cp the probability of 
system failure is 

P f = P(C 1 UC 2 UC 3 U...UC m ) (4.11) 

and 

P s = 1 ~P f 

The Expression (4.1 1) can be expanded in the same manner as (4.10) and all 
the comments on (4.10) apply equally well to this expansion. By Boole's 
inequality 

P f <P(C 1 ) + P(C 2 )+...+P(£ m ) 

This upper bound of the probability of failure is useful in the high reliability 
region. The above inequality can be manipulated into the following form 

P s > I- [P(C X )+P{C 2 ) + . . . +P(C m )] 

This lower bound is good when the component reliabilities are high. The 
components being assumed independent 

p( Ci )=i-UPj d 

When the explicit formulae can be derived, the following procedure can be 
used to obtain the exact values. 

1 . Calculate the cut set availabilities 

p(cd = i - n P id 

2. The formula for system success assuming the cut set independence is 

p. = n p(cd 

Vi 

3. Replace pj^ by p }U> the resulting formula will be the exact expression for 
system reliability. 



1 20 System Reliability Modelling and Evaluation 



Frequency Calculation Using The Gut Set Approach 

In order to understand the derivation of the failure frequency formula, the 
relationship between the minimal cut set and the system state-space should be 
understood. Consider a minimal cut set which has components / and m as 
its members. This means that if the components / and m fail, the system will be 
failed irrespective of the states of the other components of the system. The 
failure of the members of C^-is equivalent to the system being in subset ^ of the 
state space S, where 

S- = | Sji in the state Sy, the components / and m are failed and the other 
components exist in a particular state | 

The state in which / and m are failed and all of the other components are 
functioning is called the vertex state of the subset The system can transit from 
the vertex state either upwards (in the sense of less components in the failed 
state) by the repair of the failed components / and m or it can transit downwards 
(in the sense of more components in the failed state) by the successive failures of 
more components. The subset S- is constituted by the states generated by the 
downward transitions from sj. The system could transit out of S- by the repair 
of 7 or m and therefore the frequency of encountering subset Sj is 

/;• Z p(sj) z Uk 



= { Z J ( I ^ 



= / Z KQ) 



where 



Mi = Z Mfe 

HElCi 



The relationship between the cut set and its equivalent state space subset 
can be more clearly understood by reference to Fig. 4.10 where the cut set G 
of Fig. 4.8 and the equivalent subset are shown. S' l = jsj f- 2 , . . . ,s & | . The 
states which are members of S { are generated by successive failures of 
components 3, 4 and 5 from the vertex state s { . From any state eS 1 , the system 



System Reliability 121 



could transit out of S l by the repair of component 1 or 2 which are the members 
of C x . The frequency of encountering S x is 



Z P(sd 



(Mi + *ia)*(Ci) 

(Ml + M2)(PldP2d) 



Now consider another minimal cut set and its equivalent subset S^ of the 
state space S. The reader can appreciate the arguments by taking Cy. = G 3 in 
Fig. 4.10 where C 3 is the minimal cut set of Fig. 4.8. If .5- and S^. were mutually 
exclusive, there could not be any transition between S^ and S^.. This can be seen 
as follows/Suppose that from state s l t of subset 5-,'a transition is possible to 
state of subset S^ by the failure of a component, then the state and all the 
states generated from by the downward transitions will be common to both 
5. and S^. The two subsets will not therefore remain mutually exclusive. 
Reasoning backwards, there cannot be transitions between two mutually 
exclusive subsets equivalent to two minimal cut sets. In such a case the frequency 
contribution due to S- and S^ is fj+ffc In practice, however, the state space 
subsets representing minimal cutsets overlap and the frequency formula for this 
case can be derived by referring to the Venn diagram in Fig. 4.1 1 . 

Let 

Si = A l + A 2 

and 

S k = A s +A 2 
so that 

SiC\S k - A 2 

The frequency of encountering the subset S, U S k is given by 

f(StVS h ) = P(A 1 )pt i ^P(A 3 )]i k +P(A 2 )^ i nil k ) (4.12) 
where 

Mi = Z M/ 
Mfe = Z Mr* 

and 

Mi H M fe = Z Mj 



122 




Fig. 4.10 The equivalence between minimal cut sets and 
state space representations 



System Reliability 123 

That is jli n pL k represents the summation of the repair rates of the components 
common to both minimal cut sets. For example for C\ and C 3 , £i C\ /2 3 = \i x 
since only component 1 is common to both cut sets. 
Equation (4.12) can be written as 

fiSiUS k ) = P{A, +A 2 )ji i +P(A 3 +A 2 )H k +P(A 2 )(fi i nfi k ) 

-P(A 2 )p ii -P(A 2 )(X k 

= P(S t )Ht + P(s k )tk -P(Si n s h m + Mfe-MiO^) 

= 7(5 i )+/(5 k )-/(5: i n 1 s fc ) (4.13) 
- P^ft + P(C fe )M fe n C fc )0t* u Afe) (4.14) 

In the above expression fa U ju fe is the summation of the repair rates of the 
components which belong either to C, or to C k or to both i.e. 

Mi U Mfc = £ 





^1 


^3 


^2 



Fig. 4.1 1 Venn diagram for overlapping subsets representing 
two minimal cut sets 



Equations (4.13) and (4.14) can be extended to the union of three subsets 
representing three minimal cut sets as follows 

f(s t u s k us,) = f(Si u s k ) +f(s t ) -/((Si u s k ) n 50 
= M) + f(s k ) -f(s t ns k )+ f(sd 
-.f((St'nsi)u(s h nsd) 



124 System R eliab ility Modelling and Evaluation 

fQd -/fa Pi S k ) 

■ - V(S t n sj) + f(s k n s t ) -m n s k n $)] 

= +/(**)+/&) 

- n ^) + M- n 50 +■/<&, n $)] 

'- [P(Q n c fe )(Mi u + P(C fc n cftGifc u^z) 
+ P(C ; nc fe nc i )feup fe u// i ) 

where as before ft U £ fe U ft is the summation of the repair rates of 
components which belong to any or all of the minimal cut sets C h C k and C t . 
Proceeding in the above manner, the formula can be extended to the general 
case of m cut sets 



ft = f(S 1 US 2 US 3 U...VS m ) 

= [PiC^ + P(C 2 )jx 2 +./.+P(C m )AU ^ Q terms order 1 



- [P(d nc 2 )(Ai u &)+.?(<:, nc#! u/i 3 ) + . .'. 

+ P(C i nC i )(f/ i Uji i )] ^| terms order 2 

+ [P(Ci nc 2 nc 3 )0ii Uju 2 u/2 3 ) + ... . 

+ P(C* O Q n C fe )(ft U ft U ftj] <- J terms order 3 



(-l) m ' l [P(C 1 nc 2 nc 3 n...nc m ) 

X (/ii U /i 2 U /I3 U . . .UftJ] order m 

(4.15) 



The calculation of the terms of the above expansion involves simple 
multiplications and additions. The contribution by terms beyond the third or 



System Reliability 125 



fourth order is quite insignificant and the calculations can be suitably truncated. 
In high reliability systems the first row alone gives good results 

fm- I PiCdtii 

This equation gives the upper bound approximation for the frequencyof system 
failure. The lower bound to the frequency is 

fLi =fm~I HQ n + 

fy x and are the first upper and lower bounds to the approximation of fj 
given by Equation (4.1 5). Successively closer alternating bounds can be 
obtained by the additions of odd and even order terms. The computation can 
be truncated when the margin between upper and lower bounds becomes 
negligible. 

The use of the cut set manipulation equations can be illustrated by application 
to the network of Fig. 4.8. The minimal cuts for it have already been enumerated 
in Table 4.1 . All the components are assumed identical each having X and p. as 
the failure and repair rates. The probability of component failure 

A + ju 

The expression for probability of system failure is 
P f = P(C t u c 2 u c 3 u c 4 ) 

= [i>(co + p(c 2 )+P(c 3 ) + P(c 4 )] 
■ -[P^n^+PiCxCx^^PiCxnc^ + PiC^nc^y 
+ P(c 3 nc A ) + P(C 2 nc 4 )] 

+ [PidnCinCiy+Pid n-c 2 r\c*)+-P(c 2 n c 3 n c 4 ) 
. + P(c x n c 3 n c 4 )3 - [P{c x n c 2 n c 3 n c 4 )] 

The various terms are evaluated below 
P{C X ) = P(C 2 ) = { Pd f 
P(C 3 ) = P(C 4 ) = ( Pd f 

P(C X n c 2 ) = P(c t nc 3 ) = Pic, n c 4 ) - P(c 2 n c 3 ) 
■== P{c 2 nc A ) = { Pd f 



126 System Reliability Modelling and Evaluation 
P(C 3 nC 4 ) = (p d ) 5 

P(C X nc 2 n c 3 ) ■ = ncj'nc 4 ) = P(C 2 nc 3 n c 4 ) 

- nc 3 n c 4 ) = (p d ) 5 
p&n^nCinc*) = (p d ) 5 

Substituting these values 

P f = 2rf+2p d ~5rf+2p d 

Calculation of frequency of failure. 

The formula for failure frequency can be written as 

f f - [P{c x )Ui +p(c 2 )/i 2 +p(c 3 )iU3 +p(c 4 )h 4 ] 
-[P(Ct nc 2 )(jii um + p(Ci nc 3 )0a! uja 3 ) 
+ p(Ci nc 4 )0ii u^t 4 ) + p(c 2 nc 3 )02 2 u/2 3 ) 
+ P(C 3 nc 4 )(/i 3 u/x 4 ) + P(c 2 nc 4 )0u 2 uja 4 )] 
+ [P(Ci n c 2 n c 3 )(//! u & u 
+ P(C 1 nc 2 nc 4 )(/z 1 u j u 2 u j u 4 ) 
+ P(C 2 nc 3 nc 4 )0a 2 u/a 3 u/i 4 ) 
+ p(c x n c 3 n c 4 )0ii u ju 3 u ju 4 )] 
- [P(c x n c 2 n c 3 n c 4 )] u m 2 u m 3 u /x 4 ) 

Now 

/2i = ju 2 = 2/z 
£ 3 = #4 = 3/i 

& Up 2 = /fj Uj5 3 - jEt! U/z 4 = £2 Uju 3 = ju 2 UjQ 4 = 4ju 
/2 3 U £ 4 - 5/i 

& U ju 2 U ju 3 = ju! U ju 2 U ju 4 = ju 2 U ju 3 U ju 4 = /ii U ju 3 U ju 4 = 5ju 

and 

Mi U i± 2 U fi 3 U £ 4 = 5ju 



System Reliability 127 



Substituting these values 

/f = (4rf+6pl-2Qp3 + 10p5)Ai 

The numerical values were obtained by keeping ju equal to 438 repairs per year, 
i.e. a mean down time of the component equal to 20 hours, and varying the 
failure rate from two failures per year to 219 failures per year. These results 
are shown in Table 4.2. The purpose of this study is to show the difference 
between exact and upper bound approximate results as a function of the 
component reliability. It can be seen that when component reliability is 
close to unity, the upper bound approximations gives almost exact results. 



Algorithm to Determine Minimal Cut Sets 

When the reliability block diagram is small, the set of minimal cuts can be found 
by visual examination. In a large network such an examination could be very 
laborious and perhaps even impossible. Reference 3 describes an algorithm for 
generating the minimal cuts of a network. This procedure is quite suitable for 
computer application. 

The input or supply node of a reliability block diagram will be denoted by s 
and the output or the load node by /. The removal of the components in a 
minimal cut separates the network into exactly two connected subnetworks, 
one containing the node s and the other node /. The minimal cut in other words 
partitions the set N of all nodes into subsets N + and N~. The set defines a 
connected subnetwork that includes .5 'and TV defines another connected 
subnetwork that includes /. 

The algorithm generates a tree of the network, the vertices being the points 
and the edges being the line segments on the tree. The tree starts from the root 
vertex denoted by 0. The edges are marked n + or n . The symbol n + means that 
the node n of the reliability diagram is a member of7V + and n means that node 
n of reliability diagram is a member of N~. With each node i of the tree are 
associated four subsets of the nodes of the reliability diagram, X u , X 2i , X 3i 
and W t . Let h t be the unique simple path connecting vertex / to the root vertex, 
then 

Node n €1^ if an edge in the path h { is labelled n + 

Node n 6 X 2i if an edge in the path h t is labelled n~ 

Node n E X 3i if it is neither in X x t nor in X 2i but is a member of N. 

Node n G W t if it is in X 3i and if it is the terminal of a component whose 
other terminal is a member of Xi j 



128 



5 



T3 ^ 

-a a. 



§5 
Il 

6£ 





















O 








o 








o 




X 




l O 


1 — 1 

X 


4x 


o 

o 


c-~ 








o. 


c-~ 


ON 


ON 


CN 


© 


CO 




s 




© 






w 


ON 




Tj- 


00 




o 


o 


tO 


00 


ON 


to 


<S) 






<- J 


ON 


ON 




to 


CN 


00 


CN 










CO 






O 


o 


O 








— 








X 


X 


X 


ON 


lO 


00 




CN 


oo 




00 






OO 


CM 


ON 






(N 


CN 


CO 




to 


to 


CO 


l> 




NO 












co 


On 






00 


to 


[--. 




NO 




rf- 


t>- 




t-- 


o 




to 


CN 


00 


<N 


(TV 


m 


1 


1 




1 

O 


O 


o 


O 




















X 






NO 




o 


CN 


■"sf 


CO 


o 


r— 


NO 


CO 


00 


CO 


to 


ON 










CN 


On 


o 


00 


ON 


NO 




CO 


to 


CO 


On 


CN 


o 




CM 


CN 


in 


tO 




O 


NO 




to 


to 


On 


ON 




no 


<N 


On 


CN 
















1 


O 










i — 1 




X 


X 


X 


X 






00 




o 


On 


00 




00 


ON 


CO 




© 


ON 


uo 


to 


c- 




no 


*tf 


CO 


to 


NO 


t> 


r- 


00 






ON 


00 


ON 


CN 


to 


CO 


r-- 


r-- 


to 




NO . 


On 


CN 




to 


LO 




Tt 




"9 


CN 


On 


CN 








ON 


r-- 


O 


CO 


00 




so 


to 


NO 


to 




NO 


On 


O ■ 


r- 


ON 


NO 


O 


cn 




T— 1 


NO 


ON 


oo 


s 


CO 


\o 


On 


On 




On 


NO 


6 


6 




© 


6 


o 


o 


© 


© 


p 




cb 




CN 


ON 








CO 












CN 



System Reliability 129 



The algorithm now proceeds as follows: 

Step 1: .!. Generate vertices 0, 1 and 2 and label the edges (0, 1) and (1,2) 
as s + and i~ respectively. This means that for all minimal cuts s and 1 are 
members of N + and JV" respectively. The vertices 0 and 1 are assumed 
scanned but vertex 2 is unseanned. Go to step 2. 

Step 2: If there are no unseanned vertices, the complete tree has been 
generated and the algorithm terminates. Otherwise, choose the unseanned 
vertex with the greatest index and mark it scanned and let it be called /. Find 
the unique simple path h x from i to the root vertex 0, and identify the subsets 
%iu x 2i, Xsi and W t as defined above. If W ( is null, i.e. has no member go to 
step 7, otherwise choose x, an element of W t and construct the subnetwork 
defined by the set of nodes X 4 = X 2i U X 3i ~~x. If this subnetwork is connected, 
go to step 3, and if not go to step 4. 

Step 3: Generate two new vertices k and k + 1 where k is one greater than 
the highest index so far in the tree. Create edges {i, k) and (i, k 4- 1) and label 
them x + and x~. The vertices k and k + 1 are unseanned vertices. Go to step 2. 

Step 4: Find the set of nodes X 5 which defines a connected subnetwork that 
includes 1 . If X 2i is a subset of X 5 go to step 5, otherwise to step 6. 



Step 5: Generate vertex k and edge (/, k) labelled x*. Determine the set 
X 6 = X 4 — X s . If | X 6 j is the number of the elements of X 6 , then create 
vertices k + 1 , fr, . . . , * 4- \X 6 [ and generate edges (k, k + 1), (k + 1, k 4- 2) 
. . . , (k 4- i X 6 j - 1 , k + | X 6 |) and label them Z + where Z G X 6 . Finally generate 
vertex k + \X 6 \ + 1 and create edge (i, k + \X 6 ) 4- 1) labelled x~ . Go to step 2. 



Step 6: Generate one new vertex index k and create the edge (z, k) labelled 
x~. Go to step 2. 



Step 7: At this step a minimal cut has been generated. The set N? = X u and 
N[ — N — X u . The components whose one terminal is in JV7 and the other in 
JV," are the members of the minimal cut. Go to step 2 to create other minimal 
cuts. 



130 



1 o 




Fig. 4.12 The minimal cut generation tree 



The above algorithm is illustrated by application to the reliability block 
diagram of Fig. 4.8. As the nodes are to be marked by integers the components 
may be represented by letters as shown in Fig. 4.12 which shows the tree for 
this network. The reader is urged to verify this tree by going through the 
different steps of the algorithm. 



System Reliability 131 

Exercises 

1. Calculate the probability and frequency of system failure in Example 4.2 
using the state space approach. 

2. Enumerate the minimal cut sets for the following reliability block diagram. 




References 

1. R. Billinton and C. Singh, Generating Capacity Reliability Evaluation in 
Interconnected Systems Using a Frequency and Duration Approach— Part I. 
Mathematical Analysis , IEEE Trans,, Power App. and Systems (July/ August 
1971). 

2. J.A. Buzacott, Network Approaches to Finding the Reliability of Repairable 
Systems, IEEE Trans, on Reliability, R-12, 4 (November 1970). 

3. P. A. Jenson and M. Bellmore, An Algorithm to Determine the Reliability 
of a Complex Systems, IEEE Trans, on Reliability, R-18, 4 (November 
1969). 

4. C. Singh and R. Billinton, A New Method to Determine the Failure Frequency 
of a Complex System, Micro-Electronics and Reliability , Vol. 12, No. 5 (1973). 

5. C. Singh, 'Reliability Modelling and Evaluation in Electric Power Systems', 
Ph.D. Thesis, University of Saskatchewan, Sakatoon, Canada (August 1972). 

6. L.M. Shooman, Probabilistic Reliability, New York, McGraw-Hill (1968). 



CHAPTER 5 

Techniques for Large Systems 



Introduction 

Several mathematical modelling techniques suitable for reliability evaluation 
have been illustrated. Although the problem of system reliability may appear 
conceptually simple, at least when constant transition rates can be assumed, the 
task may become difficult with large and complex systems. In control systems, 
communication networks and power networks, it is sometimes easy to derive 
the reliability block diagrams from the schematic diagrams. When the network 
approach can be used, the problem becomes relatively simple. In certain cases, 
however, it is difficult and often impossible to apply this approach. This is 
especially so with systems involving dependent failure or repair modes and those 
involving graded modes of operation. In such cases the state space approach is 
often the only method available. This chapter will identify problem areas while 
using this approach and suggest suitable techniques for overcoming these 
difficulties. 

The Problem Areas 

It was pointed out in the last chapter that the state space approach essentially 
involves the following steps: 

1 Evolution of the state space and the interstate transition rates. 

In a small system, it is possible to draw the state transition diagram of the 
system and then solve it with a calculator or program it for the digital computer. 
When, however, the state space is large, this procedure becomes impractical and 
often impossible. In such cases this process can be performed by a computer. 
The method was described in the last chapter. The basic idea is to let the states 
sequentially evolve by the realization of each possible transition mode of the 
components. When the transition modes of the components are dependent, 
system states may impose restrictions on some transition modes but in systems 
consisting of independent components, each component is allowed to realize 
all of its transition modes. In certain systems using the symmetry of the 
transition diagrams, special methods may be evolved for generating the state 
space and the interstate transition rates. The main problem in this step is to 
ensure that the correct state space and interstate transition rates have been 
generated. The generation of a correct state transition rate matrix is very 



Techniques for Large Systems 133 



important as this is the foundation for further calculations. 

In some cases, the program may be used to generate the transition rate 
matrix of a similar but smaller system. The matrix along with the description 
of states can be printed out and checked visually. For ease in checking, fictitious 
transition rates, usually whole numbers may be used. When this is not possible 
or convenient, the program should have an independent checking subroutine. 
Each system state should be examined for the possible modes of transition. For 
each mode the resulting state description should be constructed and compared 
with the ones which have already been generated. If both the state description 
and the interstate transition rate agree, for all the states, it is reasonable to 
assume that the state transition rate matrix is correct. If at any state there is any 
discrepancy, the state and the resulting states should be printed/This will help 
in debugging. 

The other problem is the size of the state space. As an example, a system 
consisting of.ra independent two state units will have 2 n possible states. The 
state transition matrix will be of 2 n x2 n size. The available memory in the 
computer may soon be exhausted. This difficulty can be alleviated to some 
extent by using the principles of sparsity programming. This problem will, 
however, be discussed in more detail later. 

2 Calculation of the state probabilities 

When the components are independent, the problem becomes simpler, 
since the system state probabilities can be derived from the component state 
probabilities by the simple multiplication rule of probabilities. Since the 
calculation of the probability of each system state can be done independently 
of the other states, evaluation can be made selectively for the states required 
for final calculation of the reliability measures. In the case of dependent 
transition modes the probabilities of all the states have to be calculated since 
the states can be solved independently for probabilities. If time dependent 
solutions are required, the state space equation has to be solved. The methods 
of doing it have already been discussed. Most often, however, the steady state 
solution is required. The problem is then reduced to solving a set of 
simultaneous linear equations. This is usually done using the Gauss elimination 
or Gauss- Jordan method. These techniques are explained in Appendix I. 
Though conceptually simple, the problem can become formidable even on the 
computer as the number of components becomes large, and the state space 
becomes very large. Computer storage limitations, the errors introduced by 
rounding off and the computation time required all make the problem a very 
difficult one. It should be appreciated that the transition rate matrix contains 
failure rates which are usually very small as compared to the repair rates which 
are comparatively large. The operations on small and large numbers are bound to 
introduce rounding off errors. It can therefore be appreciated that steps have to 
be taken to limit the state space. 



134 System Reliability Modelling and Evaluation 

3 Calculation of the reliability measures 

The two key indices are the probability and frequency of encountering a 
certain configuration of states. The other indices, the mean cycle time and the 
mean down time can be then simply derived. Success or failure may not provide 
a complete description and it may be necessary to evaluate reliability measures 
for graded modes of operation e.g. in a power system it may be necessary to 
calculate the probabilities and frequencies of having different magnitudes of 
capacity deficiencies. Similarly in a transit system it may be necessary to know 
the probabilities and frequencies of having various numbers of vehicles available 
for transportation. The various states are then grouped into subsets denoting a 
particular condition. The probabilities and frequencies of these subsets are then 
calculated using the equations given in Chapter 3. In general all the states have to 
be scanned to classify them into the required subsets and to select the 
appropriate transition rate's for frequency calculations. In many cases, the 
classification can be done by taking the advantage of the systematic pattern of 
the state space. But if all the states have to be scanned, it could be a time- 
consuming process. 

It can be seen from the above discussion that the problem with large systems is 
essentially the size of the state space. In some situations the size of the state 
space does not grow proportionately with the number of components. The 
growth of the state space in such cases is restricted because of the dependency 
considerations. A familiar example is that of a series system of n components 
when the assumption is made that the exposure of components to failure is 
zero when the system is down. In this case the number of states is simply n+1 
as compared with 2 n when the component failures are independent. 
In general, the following procedure should be adopted. 

1. The system should be divided into suitable subsystems which can be 
handled conveniently one at a time. A system is usually naturally subdivided 
into subsystems on the layout or functional basis. Most often this natural 
subdivision can be used as the basis for classification for reliability evaluation, 
but it is not necessary to do so. Primary concern is on the system effect of 
component failures. Every attempt should be made to divide the system into 
independent subsystems. The advantage in doing so is that the probabilities of 
the system can then be found by simple multiplication of the probabilities of 
the states of the subsystems. Another advantage is that the combination of the 
independent subsystems is simpler and the equivalent transition rate concept can 
be more conveniently employed.. This will become clearer when the implications 
of the equivalent transition rate are considered. The independence of various 
subsystems may sometimes be achieved by shifting some components from 

one natural subsystem to another. This trick can sometimes prove quite useful. 

2. The state space of each subsystem may be reduced either by merging states 



Techniques for Large Systems 1 3 5 



or by truncating very low probability states. The principles of both of these 
techniques are explained in detail in this chapter. 

3. The subsystems should then be combined into a complete system and the 
required reliability measures evaluated. 

Two important concepts in large systems are therefore the merging of the 
states and truncation of states. 

Equivalent Transition Rate and the Conditions of Mergeability 

The equivalent transition rate concept was introduced in Chapter 3 and used for 
independent subsystems in Chapter 4, while using the network reduction 
approach. The principal use of the equivalent transition rate is in reducing the 
system or subsystem state space. The basic idea is to find a state space which is 
equivalent to the original state space but is more convenient to use. Assume that 
the entire state space is partitioned into m subsets 

Xi,i— 1,2, . . . , m. The equivalent transition rate from subset X p to X q is 
obtained by using Equation (3.37) 

This section examines the conditions under which this equivalent model will 
give the same results as would be obtained by using the original model. 
Complete knowledge of these conditions is quite important and the lack of 
awareness in this area can lead to gross errors. The concept of equivalent 
transition rate will be examined both in the transient domain and under the 
equilibrium conditions. This concept can be useful in the following ways: 

(a) System State Space Reduction 

Sometimes it may be desirable to reduce the state space of the system by 
merging together certain sets of states. The equivalent transition rates among 
the subsets of the lumped states are required. 

State merging in the system state space is generally of value when the 
equivalent transition rates can be calculated without having to solve for the 
system state probabilities. 

(b) Subsystem State Space Reduction 

The most practical method is to break down the system into subsystems 
which can be individually solved and then to combine these solutions to get the 
results for the entire system. It may be desirable to reduce the state space of an 
individual subsystem and therefore the equivalent transfer rate will be calculated. 



136 System Reliability Modelling and Evaluation 

Model reduction in this case is useful even if all of the subsystem state 
probabilities have to be calculated to determine the equivalent transfer rates. 



Transient Domain 

The equivalent transfer rate is first examined from the point of view of system 
state space reduction. The equivalent transition rate from subset to is 
given by Equation (5.1). 

The following observations can be made from the relationship in (5.1): 

1. Since the state probabilities in Equation (5.1) depend upon the initial state 
probability vector, the equivalent transfer rate, Xj^(?) is a function of the initial 
state probability vector. In contrast to this the interstate transition rates X^- of 
the original state space are independent of any such condition. If xj^(?) are to 
be independent of such a restriction they must then be independent of the state 
probabilities. This can happen if 2X-- is the same for all ieX so that 



(5.2) 



2. Unless the expression for equivalent transfer rate is independent of the 

state probabilities as in Equation (5.2), xj^(r) is obviously a function of t. If an 

explicit expression for xjj^(f) can be obtained, the solution for the reduced state 
space can be obtained by using Kolmogorov differential equations for the time 
specific transition rate matrix 



A(u)H,,u) = M (5 . 3) 

Oil 

where for u> i 

P(t,u) = The column matrix whose zth value pfit,u) represents the 
probability of being in state rat time u for the given initial 
condition at t, i.e. 

Pi (t,u) = P{Z(u) = i\Z(t)=j] 



and A(u) = The transpose of the time specific transition rate matrix. When 
the initial conditions are specified at the time of origin 



Techniques for Large Systems 1 3 7 



AMmu) = ^l (54) 



When Equation (5.4) is used for the reduced system, the initial condition, 



Z(0) should be the same as for the X^>) in (5.1) 

Writing the differential equation for the lumped state q 



= I X<>)p p (0, M )-^(0,«) I X&>(«). (5.5) 



where X r = {All states except <?}, the subscript r indicating that the reference is 
to the reduced state space. 

Since the initial condition for (5.1) and (5.4) is the same 
P P (0,u) = £ Pi {u) 

and 

— r- — - L Pjiu) 

The indices p and q refer to the lumped states whereas i and / indicate the orig- 
inal states. Substituting these values into Equation (5.5) and substituting the 
values of \$%(t) from Expression (5.1) 

2>K«)= z I I p,mu- Z Z Z 

Now ifX + = X q and X~ is the disjoint subset^ then the above equation can be 
written as 

Z pK«) = Z Z pMh,- Z Z Pi{")h, 

jex* iex* yex + /ex* iex - 

It can be seen that this is the same expression as would be obtained by 
following the argument used to derive Equation (3.7). The time specific 
equivalent transfer rates, therefore, do represent the process accurately but 
the essential point is that this does not solve anything because all the state 
probabilities have to be found before the equivalent transfer rates can be 
determined. The only case when the state probabilities do not have to be 
evaluated is the one given by Equation (5.2). In summary, mergeability from a 
systems state space viewpoint can be defined as follows: 



138 System Reliability Modelling and Evaluation 

Definition: If the entire state space is partitioned into m subsets JQ, i = 1 ,2,...,m 
and the equivalent transfer rate from subset X to X , given by Equation (5.1), 
is time invariant and independent of the initial state probability vector, then the 
state space is said to be mergeable into the said partition. 

The necessary and sufficient condition is that the transition rate from each 
state in subset X to each of the states in X q when summed over all the states 
in X is the same for each state in X p and the required equivalent rate is given 
by 

When examining mergeability from the point of view of subsystem reduction, 
the state space of a subsystem S a is assumed to be partitioned into subsets X^, 
X%, Jtfjj in such a manner that no information about the reliability analysis is 
lost. The equivalent transfer rate from one subset to another can be obtained 
using Equation (5.1). The important consideration for the present purpose, 
however, is that this equivalent rate should hold when this subsystem is 
combined with another subsystem. The equivalent transfer rate from the 
lumped states of subset X^ to those of X? is given by 




After merging together the states in each subset, there will be p equivalent 
states in the subsystem S a , one equivalent state corresponding to one subset. 
This subsystem is now combined with another subsystem 5^ with equivalent 
states b l ,b 2 , b . It is assumed that in the combined system there will be 
px q states, there being a whole set of equivalent states a u , u = 1 p, 
corresponding to every equivalent state in subsystem b. This can be indicated as 
follows 

{b x )a u (b 1 )a 2 , . . . , (b x )a p 
Q> 2 )a u (b 2 )a 2 , (b 2 )a p 

(b Q )a u (b Q )a 2 , (b Q )a p 

It should be appreciated that this arrangement may not always be possible 
as some combinations may be incompatible. In the subsystems exposed to a 



Techniques for Large Systems 1 39 



common two state fluctuating environment, which may alter the component 
failure and repair rates, such an arrangement will exist in either state. Only the 
arrangement shown above will be discussed but the results will also hold for 
other cases. After combination of the two subsystems, certain interstate 
transition rates may be altered but it is assumed that the interstate transition 
rates with respect to the merged states are not changed so that the act of 
combining the subsystems does not affect the information contained in the 
merged states. 

For the transfer rates to hold after combining, the equivalent transfer rate 
from {b u )a l to (b u )a m should be the same as that from aj to a m . Here u may be 
any number from 1 to q. Therefore, for mergeability 



where 

p1(t) = The probability, for the given initial state, of being in state 'iGXf 
corresponding to the condition represented by b u in subsystem S b . 

The following conclusions can be drawn from the Equality (5 .6): 
(1) For independent subsystems 

Pi(t) = Pi{t)p hu {t) 

where 

PbJt) = The probability of being in the equivalent state b u for the given 
initial condition 

Therefore 

*?«(') - [Xa Pi&Ptuit) la M 7 £ 0 PtoPUt) 



I a Pi(t) l a U / I Pi(t) 



140 System Reliability Modelling and Evaluation 

The time specific equivalent transfer rate, therefore, holds unconditionally 
if the subsystems being combined are independent. The evaluation of these rates 
is, of course, a separate problem. 

(2) If, however, the subsystems are not independent, the Equality (5.7) can hold 
if . 

(i) 2 X/,- is the same for any /"in which case 

W ;pyO " 
JfcX m 




This is the same condition as previously specified for mergeability. 
(ii) Alternatively if for a given initial state probability vector 

p"(f) = p"(t) = .... = pf(t) which implies 

Pi(t) = p 2 (t) = • .. = Piit) 

i.e., if the states being lumped have the same time specific probability 

iexf jex m 

where ni ~ The number of states in subset Xf 

Steady state is a special case of transient analysis with °° + . The analysis 
for steady state conditions of mergeability is the same as that for the transient 
case with the following differences: 

1. The steady state equivalent transfer rate is time invariant. 

2. The steady state probabilities are independent of the initial conditions and 
therefore the steady state equivalent transfer rate is independent of such a 
restriction. 

The results of the mergeability analysis are summarized below. 

1 . If LX a y is the same for any ieX^, then the equivalent transfer rate is time 

invariant and independent of the initial condition. When this condition is 
satisfied between all the subsets taken in pairs, the Markov process of the 



Techniques for Large Systems 141 



subsystem S Q is said to be mergeable into these disjoint subsets. The equivalent 
transfer rates so determined hold when this subsystem is combined with another 
subsystem S^ provided the interstate transition information contained in the 
lumped states does not change by virtue of this combination. 

2. If for the given initial condition, the probabilities of the states in a subset are 
equal, then these states are mergeable and the equivalent transition rate holds 
when this subsystem is combined with another subsystem, provided the initial 
condition is not violated. In the steady state case, the initial condition does not 
affect the conclusion. 

3. For independent subsystems, the states of the subsystem may be lumped 
into any desired disjoint subsets. The equivalent transition rate is unaltered by 
combination but because of the computation effort required, this does not seem 
to be of much significance in the time specific analysis. In steady state analysis, 
this can facilitate the calculation of the frequency index. A familiar example is 
the lumping together of the identical capacity outage states in a generation 
system model before combining it with the load model. 

Example 5.1: The concepts outlined in the previous section can be illustrated 
for the steady state condition using the simple system shown in Fig. 5.1 . 





1 






Supply 




2 



Fig. 5.1 A simple three unit series-parallel system 



The system consists of two lines 1 and 2 in parallel. This combination is then 
in series with component 3 . Both lines 1 and 2 have identical failure and repair 
rates X and ju and each is capable of supplying the full load. The failure and repair 
rates of line 3 are X 3 and ju 3 respectively. The system is now assumed to consist 
of subsystem S Q of lines 1 and 2 and the subsystem S b of line 3 . The state 
transition diagrams of the two subsystems taken individually are shown in Figs. 
5.2 (a) and (b) respectively. On combining S a and S^, there will be a total of 



142 




Fig. 5.3 The state transition diagram after combining S a and S^, 



ta 



»3. 







DN 




<*> 



*3 





m 


DN 





Fig. 5.4 The state transition diagram after combining 
reduced S a with 



Techniques for Large Systems 1 43 



eight states. These can be written as 
(3*7)1,(3*7)2, (3(7)3,(3(7)4 
(3D)l,(3D)2,(3D)3,(3D)4 

The condition inside the parenthesis is that of and the outside numbers refer 
to the state numbers of S a . Only one line out of {l,2| is required for successful 
operation and therefore the behaviour of S a can be represented by a two state 
component in which state / corresponds to states { 1,2,3 j- and state m to state 
j 4 | of the original state space. The equivalent transition rates are 

■X = x <>2 + Ps) 

and 

?W = 2// 

The question to be examined is, do these equivalent transfer rates hold after 
combining^ and S^? This problem is considered for both complete and 
restricted repair facilities. 



1 Complete Repair Facilities 

When each component can be repaired independently, the two subsystems 
are independent. It can also be observed that when the two subsystems are 
independent, the interstate transition modes remain unchanged after interaction. 
Now, if merging is to be valid, the equivalent transition rate X^, i.e. from (3U)l 
to (3U)m should be the same as .Xj , i.e. from / to m. Examining the state 
transition diagram in Fig. 5.3 

1171 P U 1+P U 2+P U 1 

Here the superscript refers to the condition of system b. Since the subsystems 
are independent 

,ab _ (P2P3u+ P3P3u)^ (Pi + P 3 )^ 



PlPlu+PlPZu+PlPSu P1 + P2+P3 



It can be concluded from the above discussion that when S Q and are 
independent, subsystem S a can be represented by a two state component 
having the equivalent transition rates determined by Equation (5.1). This is the 
basis of the network reduction technique described in the previous chapter. 
The reason for assuming the subsystems to be independent can now be more 
fully understood. 



1 44 System Reliability Modelling and Evaluation 

2 Restricted Repair Facilities 

Now suppose that only two lines can be repaired at a time, then in the event 
of failure of all the three lines, state (3D)4, in Fig. 5.3, the repair of one line 
out of .jl,2 J must wait for the first repair. Referring to Fig. 5.4, from 
(3U)mto (3U)l is still 2\l but X^ from (3D)m to (3D)l is now jit. The systems 
are no longer independent and therefore 77" ^p'^.p^, and consequently 
tfm ^ A /tfr Mer § m S int0 tne above groupings is not possible. 

If i however, states 1 2 ,3 j- are merged to give the equivalent state / and if { 1 j- 
and |4 | are denoted by n and m respectively, the conditions of mergeability 
are satisfied and it can be seen that 

^in — \n — M 
and ^ 

Merging into these states is therefore still possible. 



Components Subject to Fluctuating Environment 

The following considers the application of conditions of mergeability to a 
system consisting of components exposed to a two state fluctuating 
environment. The two states are designated as N and S states and the 
durations of these states are assumed to be exponentially distributed. The 
component failure and repair rates are constant but depend on the environment. 
Failure and repair of the components are independent in a given environment 
but the exposure to the common environment introduces the element of 
interdependence and the probabilities cannot be found by a simple product 
rule. The conditions of mergeability developed in this chapter will now be 
applied to this system. 

The system is assumed to be divided into a number of subsystems out of 
which only subsystems S and are considered for the sake of convenience. 
These subsystems are assumed to contain n a and number of components 
respectively. Considering 5 , it can exist in 2 Ua number of states in each of the 
environmental states. The component configuration of each state in either of 
the two environment states is the same but the interstate transition rates in the 
two weather states are different. The transition rate from a system state in the 
TV environment condition to the corresponding system state in the S environment 
condition is taken as n and in the reverse direction as m. The states in the N 
environment may be grouped into subsets X^, X^, X^ and those in the S 
environment as Xf, Xf, X^ s . The combination of states in X$ and Xf s is the 
same. Since the reduced state space of S a is to be combined with the reduced 
state space of S^ which is also exposed to the same fluctuating environment, the 
states in the N and S environment cannot be lumped together. The equivalent 



Techniques for Large Systems 1 45 



transfer rates can be found by the application of Equation (5.1). It is obvious 
that the transition rate from Xf to xf is n and that from Xf s to Xf is m. The 
equivalent transfer rate from xf to X^ is given by 




where 

Pi ~ The probability of being in the z'th state in N environment 

and 

Xf m = The equivalent transfer rate from the lumped states of subsets Xf 
to those of 

The same treatment holds for the states in the S environment. After merging 
together the states in each subset, there will be p equivalent states in each 
environment condition in the subsystem S a , one equivalent state corresponding 
to one subset. These equivalent states may be indicated by a l , a 2 , a in the 
N environment condition, and , a\ , a s p in the S environment condition. 
Now suppose that this subsystem is combined with the other subsystem with 
equivalent states , b 2 , b q } and {b\, b s 2 , b s ^ [in the N and S 
environment condition. In the combined system there will be p x q states in each 
environment condition. This combination for the N environment is shown below. 

(b x )a x , (Z>0fl 2 , . . . , (bi)a p 

(b 2 )a u (b 2 )a 2 , . .., (b 2 )a p 

(b Q )a u (b q )a 2 , (bg)a p 

Since the two subsystems are exposed to the same fluctuating environment, it 
can be easily seen that the equivalent transfer rate from (b^aj to (Z)f)«| is n and 
that from (b^a] to (b^aj is m. For steady state mergeability, the transfer rate 
from (b u )ai to (b u )a m should be the same as that from a t to a m , i.e., 

•\ab _ ya 
A lm - Aim 

By the applications of the mergeability conditions the following conclusions 
can be drawn: 

1. If S a and S b are independent, the equivalent transfer rates hold 
unconditionally, i.e. the states of the subsystem may be lumped into any 
desired disjoint subsets. When S a and S b are exposed to the same fluctuating 
environment, the independence is possible if either S Q or or both subsystems 
have components whose failure arid repair rates are the same in both environment 



146 System Reliability Modelling and Evaluation 
states, i.e. these rates are environment independent. 

2. In subsystems which are not independent, the states can be lumped together 
if, and only if, either 

(i) All the states being merged have the same availability, or 

(ii) The sum of transfer rates from state / e X a to all states / € X^, i.e., 2 a X l7 
is the same for all i . m 

The conditions of steady state mergeability restrict the scope of model 
reduction when the subsystems are not independent but even in this restricted 
sense model reduction can be of considerable help in large sized networks. 

One obvious application is when the elements of a subsystem have identical 
failure and repair rates. An n identical element subsystem can be lumped into 
2(n+l) states. For example, a four element subsystem will have the following 
groups of identical states. 

Number of Number of 



Group number elements failed identical states 

in group 

1 0 0 = 1 

2 1 (t) = 4 

3 2 ( 4 2 ) = 6 

4 3 © = 4 

5 4 (J)=l 



Total 16 



The 16 states for each environment state can be lumped into five equivalent 
states. Therefore, the subsystem having 32 states can be adequately represented 
by 10 states. This application to identical elements subsystems becomes 
important when it is realized that parallel facilities normally have identical 
failure and repair parameters. 

The merging technique can be illustrated by application to the simple system 
shown in Fig. 5.5. The data for this system is given below. The N environment 
corresponds to normal weather and the S environment to stormy or adverse 
weather. 

3 

1 



Supply- 




-Supply 



,-. r- . - , „ ■ ■ , Mean duration N ~ 200 hours 

Fig. 5.5 A simple transmission network ' ' j . c . c , , „ 
9 ^ Mean duration S = 1 .5 hours 



Techniques for Large Systems 1 47 



Percentage of failures during S environment = 20% 
Components 



1,2 
3,4 



Average failure rate 
per year 

0- 5 

1- 0 



Mean down time 
hours 

5.0 
10-0 



This system can be split into two subsystems A and B having j 1 ,2 } and 
components. The state transition diagram of system i (= a or b) is shown in 
Fig. 5.6. 



State 4 
S.W. 
0 Down 



State 6 
S.W. 
1 Down 



State 7 
S.W. 
2 Down 



State 0 
N.W. 
0 Down 



State 2 
N.W. 
1 Down 



State 3 
N.W. 
2 Down 



State 5 
S.W. 
1 Down 




State 1 
N.W. 
1 Down 


n 


m 



Fig. 5.6: Two identical unit state transition diagram 
for a two state fluctuating environment. 



The following notation is used 
1 

m = — = Transition rate from Stormy state (SW) to Normal state (NW). 
1 

n = — = Transition rate from Normal state to Stormy state. 

ill ~ The mean repair rate of a component in the z'th system. The same 

repair rate is assumed in normal and stormy states. 
X ; = Normal state failure rate of a component in the z'th system. 
Xj = Stormy state failure rate of a component in the z'th system. 



1 48 System Reliability Modelling and Evaluation 



The average failure rate \ av is given by 



N 



N + S 



+ X 



N + S 



If the number of failures in the stormy weather is x percent 

XS x 
\N+\'S ~ 100 

The value of A and V can be determined using Equations (5.7) and (5.8). 



(5.7) 



(5.8) 



Subsystem Reduction 

In subsystem i, states 1,2 and 5, 6 are identical and therefore have equal 
availabilities. Condition 2(i) is thus satisfied and these states can be lumped 
together. The reduced model of subsystem A is shown in Fig. 5.7. This reduced 
model can be combined with the reduced model of system B. The resulting 
state transition diagram is shown in Fig. 5.8.*In this diagram, NW,SW, stand 
for normal weather and stormy weather states and XI, (X = A or B and / = 
0, 1,2) stands for system X with / components down. It can be seen that in the 



State 0 
N.W. 
0 Down 


Ma 2\ a 


State 1 
N.W. 
1 Down 


2Ma *a 


State 2 
N.W. 
2 Down 








\m 
' n 






iff! 

' n 






' n 


State 3 
S.W. 

0 Down . 


Ma 2X a 


State 4 
S.W. 
1 Down 


2M a x a 


State 5 
S.W. 
2 Down 







Fig. 5.7 The reduced model of subsystem A 



reduced model there are only 18 states as against 32 states in the original model. 
The steady state equations can be written in the form 

TP = B (5.9) 

where 

T = T n | T l2 | T x 



Techniques for Large Systems 



where 



Ti2 - M& [I] (6x6 matrix) 
T 13 ~ 0 (6x6 matrix) 

2X 6 [I] I 0 




0 I 2X' 6 [I] 



(6 x6 matrix) 



Ma 2A a 


A1 




N.W. 




BO 



2Ma X a 



AO 


Ma 2X a 


A1 


2 Ma X a 


A2 


S.W. 




S.W. 


S.W. 






BO 








BO 



2\ b 



M3 2X 3 



2Ma h 



2Mb 



2Kb 
2m 6 



Ma 2X £ 



2 Ma 











AO 

N.W. 

B2 


Ma 2X a 


A1 

N.W. 

B2 


2Ma ^ 


A2 

N.W. 

B2 










AO 

S.W. 

B2 


Ma 2X a 


A1 

S.W. 

B2 


2Ma *a 


A2 

S.W. 

B2 















Fig. 5.8 The state transition diagram of the system shown in Fig. 6.1 



150 System Reliability Modelling and Evaluation 

f 23 = 2 x T n (6x6 matrix) 
T 3l = 0 (6x6 matrix) 



Mi] I o 
o I \' b [i] 



and r 33 
where 



r 2 i 



(2X a + n) 




0 


m 


0 


0 




-Qt a + \, + w) 




0 


m 


0 


0 


*a 


-(2m 0 +«) 


0 


0 


m 


n 


0 


0 


-2(\; + m) 




0 


0 


n 


0 




+ \ a + m) 


2m« 


0 


0 


n 


0 


K 





and 



B = A column matrix with zero entries 

P ~ {P(0,0)N,P(0,l)iV, • • • >P(2,0)S>P(2,l)S,P(2,2)s} 



In the triple subscript of p the first two terms inside the parenthesis indicate 
the state of the components in subsystems B and A respectively, e.g. (0,1) 
denotes no component down in subsystem B and one component down in 
subsystem v4. The subscript outside the parenthesis indicates the environment 
state. Any seventeen equations of (5.9) with 



i i 1 P(u)k = i-o 

i = 0 ; = o k=N,S 



Techniques for Large Systems 1 5 1 

Table 5 . 1 Comparison of the results obtained from the reduced model and the 
original model of the system shown in Fig. 1 







Number 




A vailability 






of states 




of one identical 






lumped in 


Availability 


state obtained 


Description 


Availability 


each 


of one 


from the 


of the 


of the 


environment 


identical 


non-reduced 


lumped state 


lumped state 


state 


state 


model 


(1) 


(2) 


(3) 


(4) 


(5) 


(0,0) 


0-997155 


1 


0-997155 


0-997156 


(0,1) 


0-567732xl0" 3 


2 


0-283866x1 0" 3 


0-283866x1 0~ 3 


(0,2) 


0-172580xl0" 6 


1 


0-172580xl0~ 6 


0-172580xl0~ 6 


(1,0) 


0-22722 lxl 0~ 2 


2 


0-113610xl0" 2 


0-113611xl0" 2 


0,1) 


0-233858xl0 -5 


4 


0-584645xl0 -6 


0-584645xl0~ 6 


(1,2) 


0-203460x1 0" 8 


2 


0-101730xl0' 8 


0-101729xl0' 8 


(2,0) 


0-21 2768x1 0~ 5 


1 


0-212768xl0~ 5 


0-212769xl0" 5 


(2,1) 


0-616139xlCT 8 


2 


0-308069x1 0" 8 


0-308070x1 0" 8 


(2,2) 


0-961824xl0" n 


1 


0*961824xl0" n 


0-961873xl0" u 



may be solved to obtain the vector P of steady state availabilities. The results 
obtained by solving this set of equations are shown in Table 5.1. In column one, 
the first number inside the parentheses indicates the number of failed 
components of subsystem^ and the second in B. In column (2), the availabilities 
of the normal and stormy states are merged together. Column (3) gives the 
number of identical states in each environment state in the original model. The 
values of column (4) are obtained by dividing the values in column (2) by those 
in column (3). The results in column (5) are obtained by the analysis of the 
complete model of 32 states. It can be seen that the results in columns (4) and 
(5) have only very slight differences due to rounding off. Complete information 
about the 32 states of the original system cari thus be obtained by analyzing 
the reduced model of 18 states. 

State Space Truncation 

It has been seen that the state space can be reduced by merging certain groups 
of states. Another technique is by truncating the state space, i.e. by neglecting 
the states whose contribution to the measures of system reliability is 
insignificant. In systems consisting of independent components, the probability 
of each state can be calculated individually by the product of the individual 
component probabilities. The states required for determining the reliability 
measures are selected, their probabilities calculated and the reliability measures 



1 52 System Reliability Modelling and Evaluation 

obtained. The system states which make a negligible contribution to the final 
results can be neglected. 

When dependent transition modes are involved, the system state probabilities 
cannot be obtained directly and the set of differential or linear algebraic 
equations must be solved depending on whether time specific or steady state 
solutions are required. Consider first the steady state condition. 

The philosophy behind truncation may be understood by examining the 
following equation for calculating the probability of the z'th state 

Pi & jC P ^ hi I S 

feex* / k&x- 
The contribution to p t by a state k j= i is 

Pk^ki / £ hk 
I kex~ 

i.e. the frequency of encountering z from k divided by the total transition rate 
out of i. Therefore if the states having low probability are deleted, the 
probability of state i will not be significantly effected. The states have of course 
to be deleted prior to solving the set of linear equations. The procedure amounts 
to assuming that the deleted states have a probability equal to zero. Denoting 
the set of deleted states by Xj, the probability of this subset if there were no 
truncation is pj = Zp- . Since the probability of the rest of the state space is 
ieXrp 

now one, i.e. 

I Pi = 1 

ifc(X-X T ) 

the probability p T will be distributed over the states ie(X-X T ) where X is the 
system state space. If p j is small, then the probability distribution of the rest of 
the states will not be significantly affected. The success of the truncation 
method depends upon selecting low probability states for truncation. The 
following consideration should be kept in mind while employing truncation. 

1 . The probability p-, ieXj is less than p •, je{X-Xj). In words, the biggest 
probability in the truncated subset should be less than the smallest probability 
of the remaining state space. In systems consisting of two state components, 
this is not hard to achieve. The state space may be divided into subsets, each 
subset having states of a certain level of coincident failures. For a system of n 
identical components there will be (n+l) subsets. These subsets will have the 
following states: 



Techniques for Large Systems 1 5 3 



Subset number States description 

1 All components up 

2 One component down 



n+l n components down 

An arbitrary level of truncation should be first selected; for example the 
states having three or more than three coincident failures can be truncated. 
The computation can then be repeated by including the next subset, i.e. the 
states having three coincident failures. If the new values are not significantly 
different from the previous ones, the computation can be stopped, otherwise 
one more subset should be included and the computation repeated. 

If the units have derated levels or if there are more than one down state, for 
example, repair and switching states, the same procedure should be adopted 
with the addition that the states like switching associated with the repair state 
should also be retained. 

In the state space truncation technique, the probabilities of the states 
adjacent to the truncation boundary are affected the most and the effect 
decreases when moving away from the boundary. 

2. After the states have been truncated, the state transition diagram should 
be examined to see if the process of truncation has generated any absorbing 
states. Since the computer program generates only the transition rate matrix, 
the absorbing states can be located by examining this matrix. An absorbing 
state will have transitions into it but not out of it. The z/th element of the 
transition rate matrix gives the transition rate from state i to state /. Therefore 
if the zth row is empty, this means that the zth state is absorbing. Either the 
absorbing state should be deleted or the states where truncation has generated 
this absorbing state should be retained. 

Example: Consider a transmission system consisting of five links subjected to a 
common two state fluctuating environment, normal and stormy weather are 32 
states in each environment and a total of 64 states. The distribution of states in 
each environment state is shown below. 



Subset number 


1 


2 


3 


4 


5 


6 


Number of elements 














failed 


0 


1 


2 


3 


4 


5 


Number of states 


1 


5 


10 


10 


5 


1 



If, however, the probabilities of failure of three or more components can be 
assumed to be zero, the states in subsets 4—6 can be ignored and the matrix of 
transition rates is reduced from 64 x 64 to 32 x 32. A number of studies were 



154 System Reliability Modelling and Evaluation, 



performed on this system to test the sensitivity of the probability distribution 
over the various system states to truncation. The results indicate that this 
distribution is relatively insensitive to truncation. The results of a sample study 
are shown in Table 5.2 along with the relevant component data. Table 5. 2 A 
gives the availabilities of various states in the original model under different 
component and weather state parameters. Identical environment states are 
shown merged together. Tables 5.1 B to 5. IE show the effects of truncation. 
The limit of components on outage is indicated by MC, e.g. MC = 4 means that 
the probability of failure of components more than four is zero. As can be seen 
from Tables 5.1 B — 5. IE, the percentage error is almost negligible. This error, 
however, increases asMC decreases. It is obvious that the larger the number of 
components, the less sensitive is the probability distribution to conditional 
truncation. For a large transmission system, the system states can, therefore, be 
conditionally truncated without causing any significant error. 



Sequential Truncation 

Sequential truncation can be described as the process of building the reliability 
model by adding components or subsystems one by one and deleting the low 
probability states at each step. This method consumes more computation time 
than direct state space truncation but it is more manageable. In direct 
truncation, the decision to delete states has to be made prior to the solution of 
the state probabilities. In sequential truncation, the state probabilities are 
calculated at each step and the states with probabilities less than a reference 
value are deleted. The assumption, which is generally valid is that the 
probability of a given state will be decreased after another component has been 
added to the system. Assume that at a particular step the system has q states 
designated S 1 , jS 2 , 5 , and a component having p states, designated C l , C 2 , 

Cp is added. If this component were independent from the system which 
has been built up to this point, then there would be q x p states in the resulting 
system, there being p states for every state of the system up to this point. This 
may be represented as 



SiCi 


s,c 2 . 


SiCp 


s%c l 


S2C2 


• . s 2 c p 




S3C2 


. . s 3 c p 


s q c t 


S Q C 2 . 


SqC p 



00000 

X X X X X 



o r- ^ m 
co co o 00 00 
O N ^ vi ^ O 

ON *7< ON r-H CO r~< 

060606 



OOOOO 

X X X X X 

co co c-^ O O 

o .i—< r- o no o 

M O (N On *0 

h -h h Ifl h 

On in rn 00 (N 

on m co m on cm 

6 6 6 6 6 6 



00000 

X X X X X 

co m <— 1 (N n m 

On 00 1 — • co C ■ no 

in cm O r- 00 

00 r- 00 cm os 

on r-~ m 00 no co 

0\ M ^ (N <t 

6 6 6 6 6 6 

O M 



OOOOO 

X X X X X 

H rH h *0 ffl VI 

*-h in 00 00 Tf 

VO W H H O 

co r — co in co 

On CM On CO On 

On ~h cM in CM — < 

6 6 6 6 6 6 



to in g in 

m 6 M ^ 

^ 11 II II II 



OOOOO 

x x x x x 

■^t* O CO NO NO 

in O CM O CM CO 

H rH CO 00 00 M 

r- 00 ■ — 1 co m cm 

On CO CM CO O 

On m in r-H iri cp 

6 6 6 6 6 6 



¥ ¥ T V- T 
00000 

X X X X X 

no 50 00 h cm o 

r— ,— 1 in m co m 

m in m no On in- 

CO CO CM On 

On 00 vi M O 

On cm —1 co in 

6 6 



o o 



a si c 
c .2 .2 



II I 3 
Z <! 2 w 



O — < cm co -3" m 



cm co Tt- m no 



B: Percentage difference from the exact values, MC = 4 



Number of 
Number of identical 
Group components states in 

number down the group % Difference from the exact values in Table A 









% Failures during 


S.W. = 20 




% Failures during S.W. = 80 










7 = 5 hours 


7 = 10 hours 


7 = 22*5 hours 


7 = 5 hours 


7 = 10 hours 


7 = 22-5 hours 


1 


0 


1 


0-0 


0*0' 


0-0 


0-0 


0-0 


o-o 


2 


1 


5 


00 


0-0 


00 


o-o 


00 


0*0 


3 


2 


10 


0-0 


0-0 


0-0 


0000001 


0-000001 


o-o 


4 


3 


10 


0-000013 


0-000005 


0000009 


0-000036 


0-000050 


0-000009 


5 


4 


5 


0004525 


0000022 


0-001343 


0-002240 


0-005604 


0-001617 


C: Percentage difference from the exact values, MC 


= 3 














Number of 














Number of 


identical 














Group 


components states in 














number 


down 


the group 


% Difference from the exact values in Table A 














% Failures during S.W. = 20 




% Failures during S.W. = 80 










7 = 5 hours 


7=10 hours 


7 = 22-5 hours 


7 = 5 hours 


7=10 hours 


7 = 22-5 hours 


1 


0 


1 


00 


00 


0-0 


0-0 


0-0 


0-000002 


2 


1 


5 


0-0 


0-0 


00 


0-000004 


0-000001 


0-000003 


3 


2 


10 


0-000009 


0-000003 


0-000003 


0-000165 


0-000035 


0-000042 


4 


3 


10 


0-002385 


0-001277 


0-000535 


0-007575 


0-004306 


0-005267 



D: Percentage difference from the exact values, MC = 2 

Number of 
Number of identical 
Group components states in 

number down the group % Difference from the exact values in Table A 



% Failures during S.W. = 20 
7 = 5 hours 7 = 10 hours 



0-0 

0-000001 
0-000466 



0-000001 
0-000002 
0-000557 



% Failures during S.W. = 80 
7 = 22*5 hours 7 = 5 hours 7=10 hours 



0-000006 
0-000001 
0-000928 



0-000018 
0-000065 
0-007762 



0-000052 
0-000062 
0*009008 



7 = 22*5 hours 

0-000159 
0-000175 
0-008023 



E: Percentage difference from the exact values, MC = 1 

Number of 

Number of identical 

components states in 
Group down the group 

number 



% Difference from the exact values in Table A 
% Failures during S.W. - 20 
7 = 5 hours 7 =10 hours 7 = 22-5 hours 



0-000174 
0-000226 



0*000534 
0-000194 



0*002139 
0-002276 



% Failures during S.W. = 80 
7=5 hours 7=10 hours 



0-001583 
0-014940 



0-003695 
0-013217 



7 = 22-5 hours 

0-009718 
0-016551 



1 5 8 System Reliability Modelling and Evaluation 



The probability of state Sf!- will be the product of the probabilities of states 
Sj and C-. Since the probability of state C- cannot be greater than one, the 
probability of SjCj will be always less than the probability of Sj. Therefore if the 
probability of state Sj is less than the reference value, the probability of S-C- 
will also be less than the reference value. 

When, however, dependent transition modes are involved, some of the above 
stage combinations may not exist. The probabilities of the resulting states will, 
however, be generally less than those of the states prior to combination. The 
method of sequential truncation will be illustrated by application to an example 
system. 

The system consists of three identical railway stations as shown in Fig. 5.9. 
Each station has a station lane on which platform facilities are provided for 
passenger disembarkation and a through expressway without any platform 
facilities. When the station lane and through expressway are both down, 
traffic cannot pass through and the system is considered failed. It is assumed 
that under this condition, no further component failure takes place. The 
guideway is assumed to be perfectly reliable. The results of this subsystem are 
to be combined with the other subsystems and therefore the probabilities and 
frequencies of various states of this system are to be determined. The failure 
and repair rate of the station lane are denoted by \ s and ju^ and those of the 
expressway by X e and \x Q . The following numerical values have been used 



Mean Up Time of the station lane = — = 800 Hours 



Mean Down Time of the station lane = — = 2 Hours 



Mean up time of the expressway = — • = 1000 Hours 



Mean down time of the expressway = — = 2 hours 

Me 



As the guideway is considered perfectly reliable, it can be left out of the 
analysis. The state transition diagram of a station is shown in Fig. 5.10, U 
stands for the up state, i.e. when both lanes are up. D means that the station 



Techniques for Large Sys terns 1 5 9 




S Station Lane 
E Expressway 
G Guideway 



Fig. 5.9. The subsystem functional diagram 




Fig. 5. 1 0 The state transition diagram of a station 



lane is down but the traffic can pass through the expressway. O denotes 
the complete station outage, i.e. both lanes are out and. O stands for partial 
outage, i.e. the station lane is working but the through expressway is down. 
These states can be represented in the computer as shown in Table 5.3A. 
The addition of one more station is shown in Table 5.3B. For each state 
of the component, there is the set of system states of Table 5.3A except 
that state (3,3) is an impossible state since it means that the two stations 
are completely out. This is not possible as the exposure to failure is reduced 
to zero as soon as one station is completely out. The system states in Table 5.3B 
are numbered in the serial order. The numbers in brackets indicate the 
combination, the first number indicating the state number of the system before 
addition and the second indicating the state number of the component being 
added. Identical states can now be grouped together and the resulting description 
is shown in Table 5.3C. The states with probabilities less than 10~ 5 are now 
deleted and the resulting description is given in Table 5. 3D. The state numbers 
are the serial numbers and have no relationship to the state numbers in Table 
5.3C. 

When the third station is added, the resulting states are shown in Table 5.3E 
and the states after merging identical states are given in Table 5.3F. The state 
probabilities are also indicated. If more stations are to be added, then the states 
with probabilities less than 10~ s can again be deleted and the procedure 
repeated. The exact results, i.e. without any truncation, are shown in Table 5.4 



1 60 System Reliability Modelling and Evaluation 



and it can be seen that the results are almost identical. In general, the results 
are slightly affected depending on the reference probability value employed for 



truncation. 
Table 5.3 

A. Model of a single station 

Number of stations 
Identical states in state 

System state fromB U D 0 0 Probability 

1 10 0 0 

2 0 10 0 

3 0 0 1 0 

4 0 0 0 1 

B. Addition of a station 

1 (1,1) 2 0 0 0 

2 (2,1) 110 0 

3 (3,1) 10 10 

4 (4,1) 10 0 1 

5 (1 ,2) 110 0 

6 (2,2) 0 2 0 0 

7 (3,2) 0 1 1 0 

8 (4,2) 0 10 1 

9 (1,3) 10 10 

10 (2,3) 0 110 

11 (4,3) 0 0 11 

12 (1,4) 10 0 1 

13 (2,4) 0 10 1 

14 (3,4) 0 0 11 

15 (4,4) 0 0 0 2 

C. Merging of identical states 

1 1 2 0 0 0 0-991051x10° 

2 2,5 1 1 0 0 0-495525x10" 

3 3,9 1 0 1 0 0-992536x10" 

4 4,12 1 0 0 1 0-396420x10" 

5 6 0 2 0 0 0-618994x10" 

6 7,10 0 1 1 0 0-165058x10" 

7 8,13 0 1 0 1 0-990308x10" 

8 11,14 0 0 1 1 0 1 3 2036x10" 

9 15 0 0 0 2 0-396090x10" 



Techniques for Large Systems 1 6 1 



D. Truncation of states with probabilities less than 10 5 

? Number of stations 

in state 

System state U D O 0 Probability 

1 2 0 0 0 0-991051x10° 

2 1 1 0 0 0-495525xl0" 2 

3 1 0 1 0 0-992536xl0" 5 

4 1 0 0 1 0-396420xl0" 2 

5 0 2 0 0 0-618994xl0" s 

6 0 1 0 1 0-990308xl0" 5 

7 0 0 0 2 0-396090xl0" 5 

E. Addition of the third station 

Number of stations 
in state 

System state U D 0 0 

1 (1,1) 3 0 0 0 

2 (2,1) 2 10 0 

3 (3,1) 2 0 10 

4 (4,1) 2 0 0 1 

5 (5,1) 12 0 0 

6 (6.1) 110 1 

7 (7,1) 10 0 2 

8 (1,2) 2 10 0 

9 (2,2) 12 0 0 

10 (3,2) 1110 

11 (4,2) 110 1 

12 (5,2) 0 3 0 0 

13 (6,2) 0 2 0 1 

14 (7,2) 0 10 2 

15 (1,3) 2 0 10 

16 (2,3) 1110 

17 (4,3) 10 1 1 (3,3) not possible 

18 (5,3) 0 2 10 

19 (6,3) 0 11 1 

20 (7,3) 0 0 1 2 

21 (1,4) 2 0 0 1 

22 (2,4) 110 1 

23 (3,4) 10 11 

24 (4,4) 1 0 0 2 

25 (5,4) 0 2 0 1 

26 (6,4) 0 10 2 

27 (7,4) 0 0 0 3 



1 62 System Reliability Modelling and Evaluation 
F. Merging of identical states 

Number of stations 
Identical states in state 



System state 


/ram 


U 


D 


0 


0 


Probability 


1 


1 


3 


0 


0 


0 


U-yooouo 


2 


2,8 


2 




0 


0 


0-739954xl0" 2 


3 


3,15 


2 


0 


1 


0 


0-148435xl0 -4 


4 


4,21 


2 


0 


0 


1 


0-591 964x1 0" 2 


5 


5,9 


1 


2 


0 


0 


0-184865xl0" 4 


6 


6,11,22 


1 


1 


0 


1 


0-295760xl0" 4 


7 


7,24 


1 


0 


0 


2 


0-118294xl0" 4 


8 


10,16 


1 


1 


1 


0 


0-493508x1 0' 7 


9 


12 


0 


3 


0 


0 


0-153901xl0" 7 


10 


13,25 


0 


2 


0 


1 


0-369310xl0" 7 


11 


14,26 


0 


1 


0 


2 


0-295407x1 0" 7 


12 


17,23 


1 


0 


1 


1 


0-394773xl0" 7 


13 


18 


0 


2 


1 


0 


0-461670xl0' 10 


14 


19 


0 


1 


1 


1 


0-738569xl0' 10 


15 


20 


0 


0 


1 


2 


0-295387xl0" 10 


16 


27 


0 


0 


0 


3 


0-787643xl0" 8 



Table 5.4 Model of three stations without truncation 



Number of stations 
State number in state 



System state 


as in 5. 3F 


U 


D 


0 


0 


Probability 


1 


1 


3 


0 


0 


0 


0-986606 


2 


2 


2 


1 


0 


0 , 


0-739954x10^ 


3 


3 


2 


0 


1 


0 


0-148435xl0* 4 


4 


4 


2 


0 


0 


1 


0-591964xl0" 2 


5 


5 




2 


0 


0 


0-184865xl0" 4 


6 


8 




1 


1 


0 


0-493508xl0" 7 


7 


6 




1 


0 


1 


0-295760xl0" 4 


8 


12 




0 


1 


1 


0-394773xl0" 7 


9 


7 




0 


0 


2 


0-118294xl0" 4 


10 


9 


0 


3 


0 


0 


0-153901xl0' 7 


11 


13 


0 


2 


1 


0 


0-461824xl0" 10 


12 


10 


0 


2 


0 


1 


0-369310xl0" 7 


13 


14 


0 


1 


1 


1 


0-738538xl0~ 10 


14 


11 


0 


1 


0 


2 


0-295407x1 0" 7 


15 


Deleted 


0 


1 


2 


0 


0-123121xl0" 12 


16 


Deleted 


0 


0 


2 


1 


0-984465x1 0" 13 


17 


15 


0 


0 


1 


2 


0-295264xl0~ 10 


18 


16 


0 


0 


0 


3 


0-787643xl0" 8 



Techniques for Large Systems 1 63 



References 

1 .= R. Billinton and C. Singh, Reliability Evaluation in Large Transmission 
Systems, Paper No. C 72 475-2, IEEE Summer Power Meeting (1 972). 

2. C. Singh, Reliability Modelling and Evaluation in Electric Power Systems, 
Ph.D. Thesis, University of Saskatchewan, Saskatoon (August 1972). 

3. C. Singh and R. Billinton, A New Method to Determine the Failure 
Frequency of a Complex System, Micro-Electronics and Reliability, Vol. 12, 
No. 5 (1974). 



CHAPTER 6 

Reliability Modelling in 
Non-Markovian Systems 



Introduction 

Most reliability models assume that the up and down times of the components 
are exponentially distributed. This assumption leads to a Markovian model with 
constant interstate transition rates. The analysis in such cases is relatively simple 
and the numerical results can be easily obtained. The assumption is often valid for 
the up time but the down times are likely to have a non-exponential distribution. 
When the components are independent, the steady state results, as is shown 
later, are not affected by the shape of the distribution. In the case of dependent 
failures there can be a very definite affect. 

If the distributions cannot be represented by a single exponential form then 
the process becomes non-Markovian and different techniques are required for 
system solution. This chapter presents some different methods for solving 
non-Markovian systems by application to specific models. 

The Difficulty with Non-Markovian Processes 

The essential difficulty with non-Markovian processes can be illustrated by the 
analysis of the reliability model of a binary unit (See Fig. 2.8). The up and 
down time durations are assumed to have the distribution Aexpj-Xx j and f(y) 
respectively. Denoting the state occupied at time t by Z(t), the equation of state 
0, i.e. the up state can be written as 

P{Z(t + At) = 0} = P{Z{t + At) = 0\Z(t) = 0}P{Z(t) - 0} 

+ P{Z(t + At) = 0\Z(t)=l}P{Z(t)=l} (6.1) 

It can be seen that as Af 0+, 

P{Z(t + At) = Q\Z(t) = 0} = 1 — \At 

The distribution of down time is, however, not exponential and therefore the 
repair rate ju(y) depends on the time y which the component has spent in state 1 . 
The coefficient of the second term on the right hand side of Equation (6.1) under 
the condition that the component has been in state 1 for time y, is therefore 
l4y)At. The required coefficient can be obtained by integrating this conditional 



Reliability Modelling in Non-Markovian Systems 1 65 



coefficient over the distribution of the time spent in state 1 up to time t. It is 
this dependence of the transfer rate on the random duration y in the state 1 that 
is the essential difficulty in formulating Equation (6. 1 ). If, however, the 
transition rates were to depend on time t in an explicitly known manner, there 
would not be any special difficulty. For example, if the repair rate were known 
as a function of time f, the coefficient of the second term would be \x{t).At. The 
basic approach to avoiding this difficulty is to convert the non-Markov process 
into a Markov process by redefinition of the state space. 

Method of Supplementary Variables 

This is probably the most direct method of dealing with non-Markovian systems 
and will be illustrated by application to a bank of three single phase transformers 
with one spare. The state transition diagram with unrestricted repair and 
exponentially distributed down times has been presented in Fig. 4.4. The state 



State 2 
Bank up 
Spare 0 


MM ^ 


State 1 
Bank up 
Spare 1 












3X 






3X 








State 4 
Bank dn 
Spare 0 




State 3 
Bank dn 
Spare 1 




State 5 
Bank dn 
Spare 2 


MM * 

+ 


M(x) * 



Fig. 6.1 . The state transition diagram of a bank of three single phase 
transformers with one spare, unrestricted repair. 



State 2 
Bank up 
Spare 0 




State 1 
Bank up 
Spare 1 




3X 








3X 


State 4 
Bank dn 
Spare 0 




State 3 
Bank dn 
Spare 1 





Fig. 6.2. The state transition diagram of a bank of three single phase 
transformers with one spare, restricted repair. 



166 System Reliability Modelling and Evaluation 

space diagram with arbitrary down time distributions and unrestricted repair is 
shown in Fig. 6.1. The corresponding model when the repair facilities are 
restricted is shown in Fig. 6.2. In this model, the repair on the second unit is 
not started until the repair on the failed unit is completed and it is reinstalled. 
The difference between the two state transition diagrams lies in the elimination 
of state 5 and the transition rate out of state 4. When the repair is unrestricted, 
both the transformers are simultaneously under repair, one for time x and the 
other for time y. With the repair restricted, only one transformer is under 
repair in state 4 and the other failed transformer is waiting. The repair on the 
second transformer cannot start until the first one has been installed and 
therefore state 5 is eliminated. The following notation has been used. 

/uW./r(4/cW = The probability density functions for the up 

time, repair duration and the change out period 
respectively 

S u (x), S r (x), S c (x) — The survivor functions for the up time, repair 
duration and the change out period respectively 

For example, S u (x) = P(U>x) = f™f u (y)dy, Uis a random variable denoting 
the up time of the transformer. 



Let Fbe a continuous positive random variable specifying the duration of 
a particular state of the component until the termination of that state. Then the 
age specific transition rate is defined as 

.00,) - lim P(y<Y<y + Ay\y<Y) = f(y) 
Ay-o+ Ay S(y) 

H(x) = The repair rate when the repair has been going on for 
x period of time 



and 



j(x) = The change out rate when the change out has been in 
operation for time x. 

X = The failure rate of the transformer. This is constant as the 
distribution for the up time is assumed to be exponential. 



Exponential Models 

If the repair and change out times are also assumed to be exponentially 
distributed, then the transition rates are all constant, i.e. ju(x) = ijl and y(x) ~ y. 
The process is Markovian as the random variables that generate it are all 
exponentially distributed. 



Reliability Modelling in Non-Markovian Systems 1 67 



Unrestricted Repair 

This case has already been discussed in Chapter 4. 

Restricted Repair 

The equations for the state transition diagram of Fig. 6.2 can be written as 

(3X + ii)p 2 -7P3 = 0 

fip 4 — 3Af> 2 =0 

yp 3 -3Ap! -/# 4 = 0 

Any three equations of the above set together with the relationship 

4 



can be solved to obtain the steady state availabilities 




Non-exponential Models 

A. General Distribu tions for Repair and Change Out Periods, Restricted Repair. 

When the repair and change out period distributions are non-exponential, the 
repair rate, ju(x) and the change out rate y(z) are functions of the age of the 
repair and change out. The stochastic process is, therefore, non-Markovian. The 



1 68 Sys tern Reliab ility Modelling and Evaluation 



most direct method of tackling this process is by the inclusion of sufficient 
supplementary variables in the specification of the state of the system to make 
the. process Markovian. In the case of transformer banks the supplementary 
variables are the times expended in the repair or change out process. The 
resulting Markov process is in continuous time and has a state space which is 
multi-dimensional and partly discrete and partly continuous. 

Define 

Pi(t) ~ The probability that the system is in state i at time t. 

P[The system is in state i, the current operation 

having been started in (t'—x — Ax, t — x)] 
Pi(x;t) = lim — — : ; 



The current operation is the process of repair or change out due to which 
the system is in state i and as soon as this operation is terminated the system will 
transit out of this state. For example in Fig. 6.2 

p (x;f).Ax - The probability that the system is in state 4 at time t and 
the elapsed time since the repair started on. the transformer 
bank lies in the interval (xpc+Ax) 

The forward equations of the resulting Markov process can be written by 
considering the transitions during the increment At. 

pAt + At) = Pi(0(l -3XAf) + At f p 2 (x; t)n(x)dx 

Jo 

p 2 (x + At; t + AO = p 2 (x; t) {1 - (ji(x) + 3X)At] 

p 2 (x + At; t + At) = {l-y(x) At} p 2 (x;t) 

p 4 (x 4- At; t + At) = {1 - p.(x)At} p 4 (x; t) + 3XAtp 2 (x; t) 

The resulting differential equations as At -> 0+ are 



dt 



3Xp } 



(0+ rp 2 (x;tMx)dx (6.2) 

Jo 



dp 2 (x;t) bp 2 (x;t ) 

dt ~~lbc — = -{3^ + Mx)}P2{x;t) (6.3) 

dp 3 (x;t) dp 3 (x;t) 

— + — — = - y(x)p 3 (x; t) (6.4) 

dt dx 

dp 4 (x;t) s dp 4 (x;t) 

a , + — = -ix{x)p 4 {x;t) + 3lp 2 {x;t) (6.5) 
at ox 



Reliability Modelling in Non-Markovian Systems 1 69 



Equations (6.2)-(6.5) can be solved under the boundary conditions 

p 2 (0;t)-=( p 3 (x;t)j(x)dx (6.6) 
Jo 

p 3 (0;t) = 3lp 1 (t) + rp 4 (x;tMx)dx (6.7) 
Jo 

and 

/MO;/) = 0 (6.8) 

The boundary Equation (6.6) results from the fact that as soon as the spare 
or repaired transformer is reinstalled the system enters state 2. Similar reasoning 
holds for Equation (6.7). Equation (6.8) states that it is impossible to be in 
state 4 without the transformer being under repair for some time. 

It is interesting to indicate at this point that Equations (6.2) — (6.8) can 
also be obtained using the frequency balancing concept outlined in Chapter 3. 
Following the approach given in this chapter, the equation for state 2 under the 
condition that repair has been in progress for time x can be written as 

A{p 2 (x; t)Ax} = - {3X + }±{x)}p 2 {x; t)Ax,At 

i.e. 

Ap 2 {x;t) - -{3X + p(x)}p 2 (x;t)At 
Now knowing that small increases in both x and / are At 

dP2(x',t) bp 2 (x;i) 
Ap 2 {x;t) = F2 : / J At+ ^ } At 
at dx 

= - {3X + p(x)}p 2 (x;t)At 
That is 

dp 2 (x;t) bp 2 {x;t) 

— dt~ ~~dx — ^ ~i 3x + ^(x)}P2(x;t) 

which is the same as Equation (6.3). 

Since the primary interest is in the steady state availabilities, Equations 
(6.2)-(6.8) reduce to the equilibrium equations as f -» °° 



3A Pl = T p 2 {xMx)dx ' (6.9) 
Jo 

= - i3X l-n(x))p 2 (x) (6.10) 
= -j(x)pAx) (6.11) 



o 

bp 2 (x) 
dx 

dp 3 (x) 



dx 



1 70 System Reliability Modelling and Evaluation 

~ti(x)P4(x) + 3\p 2 (x) (6.12) 

ox 

p 2 (0) - C PsQc)y(x)dx (6.13) 
J o 

p 3 (0) = 3Xp! + f p 4 (*)/i(x)dK (6.14) 
J o 

and 

p 4 (0) = 0 (6-15) 
In these equations 

Pi = The steady state probability of being in state i 

and 

Pi(x)Ax = The steady state probability of being in state / and the elapsed 
time since the current operation started lies in the interval 
(x, x + Ax). 

It should be noted that p t (x) Ax denotes the probability of a continuous 
state since x may lie anywhere in the interval (0, °°). The availability of the 
system condition denoted by state z can, therefore, be obtained by 

p = r Pi {x)dx (6.16) 
J o 

Equations (6.9)-(6.15) together with the normalizing equation 

t Pi = 1 (6-17) 

can be solved to obtain the steady state availabilities. 
Equation (6.1 1) on solution gives 



P*(x) == p 3 (0)exp|-| o 7(w)dwj "(6.1J 
Since it is well known that 

S c (x) - expj — J 7(w)dwj 

Ps = J o p 3 (x)dx-= P 3 (0)J Q S c (x)dx = p 3 (0)/7 



where 

— = the mean change out period 



Therefore 



Reliability Modelling in Non-Markovian Systems 1 7 1 

Pt^-hxpt+i p 4 (x)ii(x)dx\ (6.19) 
7 \ Jo j 

Solving Equation (6.10) 

(6.20) 



where 



Pi(x) = p 2 (0) exp |- j** (3 A + n(w))dwj 

Pi ~ \ Pi(x)dx = Pi(0)\ exp (— 3\x)S r (x)dx 
Jo Jo 

= j^(l-E)j~.p 3 (x)y(x)dx 

r f r (x)e~ 3Kx dx = E r (e- 3 ^) 
Jo 



Substituting from Equation (6.18) into (6.20) 
Pi = ^(1 -E)P3l 

Substituting (6.20) into (6.9) 

3X Pl = Pi(0)j~ exp^-^(3\+v(w))dw}n(xydx 

= p 2 (0)f/ r (x)e- 3 ^x = 7 Ep 3 
Jo 

Equation (6.12) can be solved using the boundary condition (6.15) giving 
p 4 (x) = exp | -J^ m(w)c/wJ J* 3\p 2 (y) ex P | /*00<^J dy 

Substituting from Equation (6.20) and simplifying 
P*(x) = yp,(l-e^)S r (x) 

and 

p 4 = f p 4 (x)dx = jp 3 - 1 ^ E ^ 
Jo j 

where 

— = Mean repair time 



(6.21) 
(6-22) 

(6.23) 



(6.24) 



3A 



(6.25) 
(6.26) 



172 System Reliability Modelling and Evaluation 
From (6.24) 

3X 

Ps = —Pi 
yE 



(6.27) 



Prom(6.23) 



1 -E 



1 -E. 



P2 



— 7P3 - -^P.- 



(6.28) 



From (6.26) and (6.27) 



3X 



P4 



1__L 

jit 3X 



(6.29) 



Substituting from Equations (6.27)^(6.29) into (6. 17). and simplifying 

3X/l 1 

p l = 1/D where D — 1+— - + - 
/' \V 7, 

l-E 3X A 3.X f 1 



PDJV = 7^3 +P4 



3X^ 



I + I _ 1 

7 i± 3X 



The frequency 

/djv =C P3(x)l(x)dx = p 3 (0) T fc(*)dx 
Jo Jo 



= p 3 (0) = Tf3 



3X 



(6.30) 



(6.31) 



It can be seen that the equations using a general distribution are different 
from those derived assuming an exponential distribution. The availabilities 
are now dependent upon the transform E which for the exponential 
distribution of repair reduces to 



Jo 



3X+ju 



Reliability Modelling in Non-Markovian Systems 173 



It is interesting to note that the steady state probabilities depend only upon 
the reciprocal of the mean change out time, i.e. the average change out rate. 
Therefore as long as the mean change out time stays the same, the form of its 
probability density function does not affect the steady state probabilities and 
frequencies (see that/£ > yy=p 3 7). This always happens when the operation is 
started and completed in the same state as for example in the present case 
the reinstallation is started and completed in state 3 of the system. If, however, 
the reinstallation were not always completed in state 3, then the form of the 
probability density affects the steady state probabilities. This is seen in the next 
section. 



B. General Distribution for Change Out, Exponential Distribution for Repair, 
Unrestricted Repair. 

It can be seen from Fig. 6.1 that in this case the reinstallation phase initiated 
in state 3 may not always be completed in state 3 but sometimes in state 5. 
The concept of average change out rate thus does not apply here. 

Defining 

Pi(x; t)Ax = The probability that the system is in the state i at time t 
and the elapsed time since the start of reinstallation lies 
in the interval (x, x + Ax) 

the differential equations can be written as 
dp *(t) f°° 

-^r 2 - - -3X^ 1 (0 + W 2 (f)+ p 5 (x;t)y(x)dx (6.32) 
ot Jo 

'^^- = -(3\ + iJL)p 2 (t)+^p 3 (x;t)j(x)dx (6.33) 

dp 3 (x-t) , ap 3 (*;Q . , , ' . , ' ■ izia\ 

— a7~" — dx — = -{m + 7(*)}p3<*;0 (6.34) 

= -.2/xp 4 (f) + 3Xp 2 (0 (6.35) 

at 

and 

dp 5 (x-t) , dp 5 (x;t) /\ _i_ / a 

J^— + — — = -y(x)p s (x;t) + np 3 (x;t) (6.36) 

These equations can be solved using the boundary conditions 

p 3 (0;t) = 3Xpi (t) + 2np 4 (t) (6.37) 

and 

Ps(0;t) ■ 0 (6-38) 



1 74 System Reliability Modelling and Evaluation 

Equation (6.38) indicates that the reinstallation phase is never started in 
state 5. 

Under equilibrium conditions, i.e. as t °°, the Equations (6.32>-(6.38) 
reduce to 

3Xpi = jup 2 + I p 5 {x)i(x)dx (6.39) 
Jo 

(3X + M)P» ={ 0 " 7(*)P3W*f ( 6 - 4 °) 

= -{M + 7W>P 3 W (6-41) 

liipt = 3Xp 2 (6.42) 

3^ = - 7(3 c)p s 0c) + MP3(x) (6.43) 

dx 

p 3 (0) = 3\ Pl +2fip A (6.44) 

p s (0) = 0 (6.45) 
On solving Equation (6.41) and substituting from Equation (6.44) 

p 3 (x) = (3Xp t +2MP 4 )exp (m 4- 7 (w))Jwj (6.46) 

Therefore „ ■ r- 

p 3 p 3 (x)d* = (3Xpi +2jup 4 )J o e-^5 c (x)^ 

= (3X^4-2^(^-1 (6.47) 

where 

E c =T f c (x)e-» x dx 
J o 

Substituting (6.46) and (6.40) and simplifying 

(3X + n)p 2 = (3\ Pl + 2np 4 )E c (6.48) 
Solving Equations (6.43) using (6.45) 

p 5 (x) = exp |-f* 7(wyw||% 3 (7)exp||J t(^^J^ 

Substituting from Equation (6.46) and simplifying 

Ps(x) =p 3 (0)(l-e-'«)5 c (x) (6-49) 



Therefore 



Ps = J*^° Ps(x)dx = (3Xp! 4- 2^p 4 ) 

Substituting from (6.42) into (6.48) and simplifying 
3X£„ 



Reliability Modelling in Non-Markovian Systems 1 75 

(6.50) 



1 ix ii 



Pi 



3\{\-E c ) + n 



From (6.42) 



Pi 



9X 2 E e 



_ 3V 

P4 " 2n P2 ~ 2M[3X(l-£ , c )4- j u] 



Pi 



Substituting for p 4 in (6.47) 
3X 



p 3 = — (l~E c ) 



1 + 



3X^ 



3X(l-^)4-^ 



Substituting for p 4 in (6.50) 
3x/ju 



M \7 



1 +E t 



14- 



3Xtf„ 



3X(l-£' e ) + i u 



Pi 



(6.51) 



(6.52) 



(6.53) 



(6.54) 



Substituting into the normalizing equation 



and simplifying 



Pi 



2 T M{3X(l-£- c )+ M } 
B 



where 



B = {3X(1 -E c ) 4- M }(2 TJ L( 4- 6Xju) 4- 3\E e (2yv 4- 6Xju + 3X 7 ) 
6X7/1 



P2 



P 4 



B 

9X 2 7 
5 



6X T (3X + ju ) 

P 3 = — 



Ps = 



5 

6X(3X4-/x)( J u-7 + 7£'c) 



B 



PDN = P3 4-P4 + /?< 



9X 2 7^ C +6X^(3X4-//) 
5 



Pt/p - 



6X7/H + 27M 2 
5 



(6.55) 
(6.56) 



The Frequency / DiV = j* p3(.x)7(.x)d!x 4-j^ p s (x)7(x)dx 



1 76 System Reliability Modelling and Evaluation 



= Pa(0) | Q ?(*) exp |- J o (ju + y(w))dwj dx 

+ ^ 3 (0)/ o (l-e-^)/ c (x)^ 
= p 3 (0) = 3Xp t = 3X(pi +p 2 ) = 3\p V p = fup (6.57) 



Semi-Markov Processes 

A semi-Markov process is a stochastic process in which the transitions from 
state to state are in accordance with a Markov Chain but the time spent in a 
state before a transition occurs is random. 

Consider a stochastic process Z(t) which at time / can be in any of the n 
distinct states, Z{t) - i denoting that the stochastic process is in state i at time t. 
Let the time just after the mth transition be denoted by f . The stochastic 
process is Markovian if 

/»[Z(f m )-/|Z(f m _ 1 ) = i,Z(t m _ 2 ) = Z(f,) = /] 

= P[Z(t m )=j\Z(t m _ l ) = i] 

If this probability is independent of the number of the transition, the process 
is time homogeneous. 

Let 

be the probability of going from state i to / in one step. The matrix of these 
transition probabilities will be denoted by A = (c^y). Given that the state i has 
just been entered, the probability of going to state / is specified by and 
F ? y(.) is the distribution function of the waiting time in state i given that 
the next transition will be to state /. The transitions can therefore be thought 
of as taking place in two stages. When the process has just entered state /, the 
next state /' is selected according to the matrix A but once / has been picked, the 
waiting time ■■ is specified by F^{.). The Markov Chain, A = (a^-) associated 
with the semi-Markov process is called an embedded Markov Chain. 

Discrete time and continuous time Markov chains are special cases of the 
semi-Markov process. For a discrete time Markov chain 

f 0 for t < c 

F » (0= |> forf >c 

where c is a constant. The discrete time Markov chain is therefore a semi-Markov 
process in which the waiting time X^ is constant. For the continuous time 
Markov chain 



Reliability Modelling in Non-Markovian Systems 1 77 



0 for t < 0 

^■(0 = 

| 1 ~exp{-Xif} for^>0, \>0 

where \ t is the reciprocal of the mean time of X t j. 
Let 

Quit) = a u F u (t) 

It should be noted that F^f) represents the conditional probability that a 
transition will take place in time given that the process has just entered i 
and will next enter /. £? z y(0 on the other hand is the probability that given that 
the process has just entered i, it will transit to state / in time less than or equal 
to t. Let 

Pij (t) = P[Z(t)=j\Z(Q) = i] 

Then 

Pu(t) = t f' Pkj(t-x)dQ ik (x) (6.58) 

fe = 0 J o 

MO =1-1 \\l-Pkj(t-x))dQ jk (x) (6.59) 



and 

Pnit) = 1 " _ 
The above equations involve convolution integrals of the form 



t 

j g(t-x)f(x)dx 
o 

The Laplace of this integral is of the simple form g\s)f(s). These integral 
equations can therefore be reduced to linear equations by taking Laplace trans- 
forms. The Laplace transform of Equations (6.58) and (6.59) in the matrix form 
can be written as 

sP{s) = [I-G(s)]- l [I-F d (s)] (6.60) 

where 

P(s) is the matrix whose p,y(s) term is the Laplace of the probability pij(t) 

G(s) is the matrix whose ij th term is a,-y fij(s) 

F d (s) is the diagonal matrix whose ii th term is equal to 2 f ik (s)oi ik 

I is the identity matrix. 

The probabilities can be obtained in the Laplace form and the time specific 
solutions found by inversion. More often, however, the steady state probabilities 
are required and these can be obtained directly using the following relationship 

P, = (6.6.) 



1 78 System Reliability Modelling and Evaluation 



where 

II,- = The steady state probability that the embedded Markov chain is in 
state i. 

m { — The mean residence time in state i. 
The steady state probability vector can be found by solving 

irA = 7T 
along with 

E*i = i 

i 

The application of a semi-Markov process is illustrated with an example of two 
three phase transformers in parallel. When a fault develops on either of the 
transformers, both the transformers are shut down. After the defective 
transformer has been isolated, the good one is returned for operation. The state 
transition diagram is shown in Fig. 6.3. 



State 1 




State 2 


2 up 




1 up 




2X 






X 






State 3 




State 4 


1 up 




Oup 





Fig. 6.3 The state transition diagram for two three phase transformers in 
parallel 



The up time and repair time are assumed to be exponentially distributed but 
the change out time is assumed to have an arbitrary probability density 
function f c (t) having a Laplace transform f c (s). The A matrix of transformation 
probabilities is 



1 0 

o -A- 

X + fi 

0 10 0 

0 10 0 



Reliability Modelling in Non-Markovian Systems 1 79 



The steady state probabilities of the Markov Chain are 
1 



3 + 2X/jU 



7T 2 



Also 
where 



" = 3+^ ' 4= w/( 3+2X 

1111 

2X ijl + X y 2ju 



mean change out time. 



7 

Substituting into Equation (6.61) 



1 2X 



where 



2X A 2 

v v y 



It can be seen that the probability density function of the change out time 
f e (t) enters these expressions through the mean value The 
same results would therefore be obtained if the change out time were assumed 
to be exponentially distributed with the transition rate equal to the reciprocal 
of the mean change out time. This is due to the fact that the change out 
operation starts and ends in the same state, number 3. It should, however, be 
noted that the time dependent solution would be different from that assuming 
an exponential distribution. 

Device of Stages 

The device of stages is a method of representing a non-exponentially distributed 
state by a combination of stages each of which is exponentially distributed. 
The method, therefore, represents a non-Markovian model by an equivalent 
Markovian model which is generally simpler to solve. Any distribution with a 
rational Laplace transform can, in principle, be represented exactly by a stage 
combination. Though this may involve complex probabilities associated with 
the fictitious stages, the probabilities of the actual states of the system are 
always real. Many distributions not necessarily having a rational Laplace 
transform can also be reasonably approximated by relatively simple stage 
combinations. The application of this technique involves the following steps. 



1 80 System Reliability Modelling and Evaluation 

1 Selection of a Stage Combination 

When the distribution has a rational Laplace transform, the stage 
combination can be found by examining the roots of this transform. In other 
probability distribution cases or of directly fitted data cases a suitable guess has 
to be made. The probability density function and the hazard rate function of a 
given distribution or data should be examined. A number of simple stage 
combinations and their characteristics are described later in the chapter. The 
characteristics of the given distribution or data should be compared with those 
of the stage combinations, and a suitable combination should be selected. The 
difference between distributions like gamma, Weibul and lognormal become 
significant only in the tail regions, their hazard functions are, however, quite 
distinct. Therefore both the density function and the hazard function should be 
compared when selecting a proper stage combination. 

2 Determination of Parameters 

When a stage combination has been selected, the next step is the derivation 
of its parameters from those of the distribution. This can be done by a moment 
matching technique which is described in this chapter. 

In addition to the constant hazard rate exhibited only by the exponential 
distribution, there are four basic hazard rate shapes. 

1. Increasing hazard rate (Fig. 6.4a) 

2. Decreasing hazard rate (Fig. 6.4b) 

3. Initial period of decreasing hazard rate followed by increasing rate (Fig. 6.4c) 

4. Initial period of increasing hazard rate followed by decreasing rate (Fig. 6.4d). 

The combinations discussed in this chapter are capable of generating these 
shapes. An awareness of this characteristic can be very useful in selecting a proper 
combination. 

General Technique for Deriving the Characteristics of Stage Combinations 

The probability density function, the survivor function and the hazard rate 
function of a given stage combination can be derived in a number of ways but 
a general technique, often helpful in difficult situations is described below. Let O 
be the equivalent state of a given stage combination. The transitions from this 
state are assumed to be terminated in an absorbing stated, as shown in Fig. 6.5. 
The process is assumed to start in the same stage, into which it would first transit 
when state O is entered. The time spent in state O is identical with the time from 
the origin as no transition is made from^t to O. Under these conditions 

f Q (x) = The probability density function of state 0, i.e., the stage 
combination 

= The time specific frequency of transiting from 0 to A 




Reliability Modelling in Non-Markovian Systems 1 8 1 
(b> 




Fig. 6.4 Some typical hazard rate functions. 




km 



Fig. 6.5 State transition diagram to determine the characteristics of a stage 
combination 



I Pi(x)\ ijt 



(6.62) 



where 



X [A = The transition rate from state /, member of 0, to the absorbing 
state A. 



Sq(x) = The survivor function of the stage combination 



1 82 System Reliability Modelling and Evaluation 



The probability of being in state 0 at any time 



= Z PtW (6-63) 

iEO 

and 

0 O (x) = The hazard rate 
= fo(x)lS 0 (x) 

= Z Pi(x)\ A / Z Pfc) ( 6 - 64 ) 
ieo / ieo 

- The equivalent transfer rate from state 0 to A 

The evaluation of Equations (6.62) - (6.64) involves the calculation of the 
probabilities of being in various stages of state O. This is generally a simpler 
method. The main advantage is that though explicit formulae may not be 
possible, numerical values of the state probabilities can be easily determined 
using the techniques discussed in Chapter 3. The numerical values of the three 
characteristics can then be calculated using the above equations. The method 
will become more clear later when the specific equations for the stage 
combinations are derived. 



Stage Combinations 

We will now describe some specific stage combinations which are simple but 
versatile enough to approximate a variety of probability distribution functions. 
The stage combinations will be shown as approximating the down time 
distribution of a two state component whose up state is assumed to be 
exponentially distributed. This arrangement is only for the sake of illustration, 
the stage combination may be used to represent a state in any other context. 

The Stages in Series 

The stages are traversed in a sequential order and the non-negative continuous 
random variable X, representing the component state duration is the sum of V 
independent exponentially distributed random variables. The Laplace transform 
of the distribution of X can then easily be proved to be 

Pl ••• p " (6.65) 

(Pi +s)...(p a +s) 

The probability density function can be obtained by expressing Equation (6.64) 
in the partial fraction form 



Reliability Modelling in Non-Markovian Systems 1 8 3 



The probability density function, therefore, is 

Z Aipi exp(-pfX) (6.66) 

It should be noted that XA ( = 1 but A { do not all lie in (0, 1) and therefore 
{Ai} is not a probability distribution. It can be easily shown that for a fixed a 
any fractional coefficient of variation between 1 and \f\fa can be produced by 
a suitable choice of {pj. 

The positions of the poles of Equation (6.65) determine the transition 
probabilities and the number of poles equals the number of stages. 

When all the stages are identically distributed with parameter p, Expression 
(6.65) reduces to 

and the corresponding probability density function is the Special Erlangian 
distribution 

where a is a positive integer. The corresponding survivor function is 
•h (i-l)! 

The characteristics of a family of Special Erlangian Distributions with a 
constant mean of one day are shown in Fig. 6.6. The exponential is a special 
case of the Special Erlangian Distribution with p = 1 . 

A generalization of Equation (6.68) is to replace the parameter a restricted 
to integer values by a parameter having any real positive value. The probability 
density function of Equation (6.68) then becomes the Gamma distribution 

p(px) a - 1 e~ px 

r (a) (6 - 69) 

where the gamma function 



Jo 



u du 



The mean and standard deviation of the gamma distribution are afp and 
y/a/p respectively and the fractional coefficient of variation is therefore 
l/y/a . For a fixed mean p. = a/p any coefficient of variation between 1 and 
l/Va may be obtained by varying a. and p in the same proportion. If p. is 
kept constant then as a, p °°, the coefficient of variation approaches zero, 



184 




Reliability Mod elling in Non-Markovian Systems 1 8 5 



i.e. there is no dispersion about the mean. This corresponds to the case of 
constant state duration of the component. 

Many empirical distributions can be represented, at least approximately by 
a suitable choice of parameters a and p. It should, however, be realized that 
as a is not always an integer it may not be possible to interpret directly the 
distribution in terms of stages. When a is not an integer, it is preferable to 
solve for integer 'a using Equation (6.68) and then the numerical answer for 
a can be obtained by interpolation. 

The Stages in Parallel 

When the stages are in parallel, there is a probability distribution j 
/ ~ I,..., a such that the random variable X denoting the state duration of the 
component, has the probability co^ of beginning on the z'th stage, the life 
thereafter consisting of a single stage, i.e. only a single stage is used in any one 
realization of X. The probability density function of X is given by 



(6.70) 



The probability density function of Equation (6.70) is formally equivalent 
to Equation (6.66) with the important difference that oj- are all non-negative 
whereas there is no such restriction onyl z -. The Laplace transform of 
Equation (6.70) is a rational one, i.e. the ratio of a polynomial of degree at 
the most V to a polynomial of the degree V. If in practice, only two stages 
in parallel are required, the Expression (6.70) reduces to 




(6.71) 



Fig. 6.7 The state transition diagram for a component having an 
exponentially distributed up time and whose down time 
has a probability density function of Equation (6.70) 



It can be shown that by a suitable choice of , p 2 and the distribution of 
Equation (6.71) can have any desired mean and any fractional coefficient of 
variation between 1 and °°. The state transition diagram for a component whose 



186 System Reliability Modelling and Evaluation 



up time is exponentially distributed with parameter X and whose down time has 
probability density function (6.70) is shown in Fig. 6.7. The total transition rate 
out of the up state, i.e. O is X but it is directed towards different parallel stages 
of the down time according to the probability distribution j w^. That this 
transition diagram does represent the distribution can be seen by solving for 
p Q (t). The differential equations are 

Po(0 = -XPo(0+ f PiPuiO (6-72) 
i=i 

Pu(t) - - PiPuit) + w,Xp 0 (0 (6-73) 
Taking the Laplace transform with the initial condition p 0 (0) = 1 

a 

sp 0 (s) = 1-Xp 0 (s)+ Z PiPu(s) (6.74) 

i=l 

and 

spuis) = - PiPuis) + cOiXp 0 {s) (6.75) 

where p(s) is the Laplace of pit). 

Substituting for Pu{s) from (6.75) into (6.74) and simplifying 

P°® = 7^ = 1 ( 6 - 76 > 

s+X—X) 

T s + Pt 

It can be easily proved that for a component having an exponentially 
distributed up time and having a down time distribution of f(t) 

PoW = 1 y rr\ (6 * 77) 

s + X — Xf(s) 

where f (s) = the Laplace transform of f(t). 

Substituting the Laplace of the probability density function (6.70) 

Po(s) = l — 

s + \ — X 2. 



"i s + pi 



This expression is the same as Equation (6.76) derived using the state transition 
diagram of Fig. 6.7. The state diagram is, therefore, an accurate representation. 
A generalization of Equation (6.71) is to have two series stage combinations 



Reliability Modelling in Non-Markovian Systems 1 87 

in parallel as shown in Fig. 6.8. The expression for the probability density 
function is 

The survivor function is 



■Pi* 



o_)i e 

(/-I)! - - # (/-!)! 



The mean and the variance are 

Mean = a> l a l /p 1 + oj 2 a 2 /p2 
Variance = [aJi#i(l +<Zi)/p? + co 2 fl 2 0 +«2>/p2] 
- [to^i/Pi + oj 2 a 2 /p 2 ] 2 




Fig. 6.8 The state transition diagram of a system with the down state 
represented by two series stages in parallel 



The Expression (6.78) is a mixture of two Special Erlangian distributions 
and can approximate a wider range of distributions than the Special Erlangian. 
The combination has only five independent parameters. The various 
characteristics of this combination are shown in Fig. 6.9. These characteristics 
cover almost all the four types mentioned earlier. The theoretical analysis of the 
shape of the hazard rate function is given in Appendix II. 



188 




Fig. 6.9 Some characteristics of the mixture of two special 
Erlangian distributions 



Reliability Modelling in Non-Markovian Systems 1 89 



Series Stages in Series with a Distinctive Stage 

The General Erlangian distribution (6.66) can generate a wider range of 
distribution than the Special Erlangian distribution but the number of 
parameters involved tends to be large. A special case combining the 
advantages of both is a series of identical stages in series with a distinctive 
stage as shown in Fig. 6.10. This model has three parameters: 





State 1, 


P 


State 1 2 





P 


State 2 a 




■ P 


State 1 a 





Fig. 6.1 0 The state transition diagram of a system with the down state 
represented by a number of series stages in series with a 
distinctive stage. 



the transition rate p of each stage and the number of stages a in the identical 
series stages and the transition rate p x for the distinctive stage. The probability 
density function is 



fix) = Pi 

\P-Pi 
The survivor function is 



O'-l)! 



S(x) = e 



x « (px)'- 1 / p V 

h o-i)! \p- Ply l 

a -P,*_ e -P* y {(p-Pi)*}'"- 1 



0-1)! 



(6.79) 



(6.80) 



and the hazard rate function 



The mean of this distribution is ajp + 1/pi 




Fig. 6.1 1 Some characteristics of the distribution associated 
with series stages in series with a distinctive stage. 



4.0 
Time, days 



Reliability Modelling in Non-Markovian Systems 1 91 

The characteristics of this distribution are illustrated in Fig. 6.1 1 and 
the theoretical analysis of the hazard rate is given in Appendix III. 



Series Stages in Series with Two Parallel Stages 

This combination has a series of identical stages followed by a stage with 
probability co l or by another stage with probability co 2 , as shown in Fig. 6.12a. 
It has five parameters and the expression for the probability density function is 



f{x) = u lPl 



p-p] 



+ W2P2 



P -P2 



£1 (z-l)! 



e -P 2 x_ e -px y 

kx (/-!)! 



(6.81) 



The survivor function can be expressed by 



W 1 

ft (/-!)! 



p-pi 



{(P-Pi)x} 1 '- 1 



- Pl x_ - P x y 



CO 2 



p-p 2j 



k (i-i)! 



(6.82) 



The transition rate function can be found by 
0(*) = f(x)/S(x) 



The derivation of these expressions and the theoretical analysis of the hazard 
rate function is provided in Appendix IV. Comparing Expressions (6.79) and 
(6.81) this combination is equivalent to two 'series stages in series with a . 
distinctive stage' in parallel as shown in Fig. 6.12b. The various characteristics 
of this combination are illustrated in Fig. 6.13. 



Determination of Parameters 

After a model is chosen to approximate a distribution, the next problem is to 
find the model parameters to fit the distribution. There are generally no explicit 
formulae for directly deriving the approximate stage model parameters from 
those of the distribution. In many cases, the parameters that will best define an 



1 92 System Reliability Modelling and Evaluation 

(a) 



State 1- 



Down | Up 



of 



WiP J State 2, KT 1 



State 0 



w 2 p 



State 2 2 



(b) 



| State 1 1 [ -»- P h»| State 1 a [ ~^- |~ State 3 V j 




j State 2 1 [ -»► »>| State 2 a " — ►[ State 3 2 | 



Fig. 6.12 The state transition diagram of a system with the down state 
represented by: 

(a) A series stage in series with two parallel stages 

'(b) Two 'series stages in series with a distinctive stage' in parallel 



empirical distribution are not known. The moments can, however, always be 
evaluated for any distribution either by exact or approximate methods. A method 
of determining the parameters for approximate stage models by matching the 
first r moments of the model and the distribution is presented in the following 
sections. This method is quite general in application. 

The parameters of the stage model are non-linear and implicit functions of its 
moments. On the contrary, the first r moments for the stage combinations 
discussed in this paper can be easily calculated from the parameters. The 
Newton- Raphson method of successive approximation is applied to solve for 
the parameters from the given moments. This method requires for each stage 
of approximation: 

1. evaluation of the moments with the given parameters 

2. evaluation of the partial derivatives of the moments with respect to each 
parameter 




0 1.0 2.0 3.0 4.0 

Time, days 

Fig. 6.13 Some characteristics of the distribution associated 
with series stages in series with two parallel stages. 



1 94 System Reliability Modelling and Evaluation 



1 Moment Evaluation for a Combination of Stages 

The probability density functions of the stage models discussed have simple 
rational Laplace transforms. The rth moment about zero can be obtained if the 
rth derivative of the Laplace of the probability density function exists at s = 0. 
The rth moment m r of the distribution is 

m r = (-l) r /*>(<)) 

where 

as s=0 

f(s) being the Laplace transform of the probability density function. Moment 
calculations for some of the stage models are given in Appendix V. 

2. Newton Raphson Method for Parameter Calculation 

The first r moments are matched by successive approximation starting 
from the initial parameters. If a model has r parameters, x\ , x 2 , . . . , x r 
to be determined by matching the first r moments, the r functions, 0i , 
0 2 , - - 0 r are defined such that 

0! = <p,(X) = m 1 (X)-M 1 

02 = fa(X) = m 2 (X)-M 2 

0 r = 0 r (X) = m r {X)-M r 

where X is the column vector (x x x 2 - - - x r )* and m r {X) is the moment of 
the stage model andM r is the r th moment of the distribution to be 
approximated. The conditions of exact match of the first r moments are 

01 = 02 = " - " = 0r = 0 

Let X 0 = 10*20 be the vector of the parameters at a 

certain stage of approximation, and let 0 be a column vector such that 

0 = (0!0 2 - - - The correction vector for parameter 

AX = (Ax ! Ax 2 - - ■ Ax r )\ can be calculated from the following matrix 

equation by the Gauss elimination method if 0(X O ) and 0'(X O ) are known. 

0(X o ) + 0 f (X o )AX =0 

where 0(X O ) is the vector 0 and X — X 0 , 0'(X O ) is the Jacobian matrix 
of0atX o ,i.e. 



Reliability Modelling in Non-Markovian Systems 195 



dx 



(X 0 ) -P(X 0 ) 



3xi 



dx- 



dx 2 



(*o) 



9x r 

902 

dX r 



(*o) 



dx 



The improved parameter values are obtained by X = X 0 4- A X. The 0(X O ) 
can be calculated directly from the first r moments of the model when 
X = X 0 (Appendix VI). 

The method for finding 0'(X O ) is discussed for some of the stage 
models in Appendix VI. 

Example system study 

The technique of stages is applied to a two state unit having the up time 
exponentially distributed and down time with a lognormal distribution. The 
following numerical values are used 

Mean Up Time = 1500 In- 
Mean Down Time = 20 hr 
The standard deviation of the down time is varied as 

(i) lOhr . 

(ii) 14.14 hr 
(hi) 20 hr 

The lognormal distribution is completely specified by its mean and 
standard deviation. The expression for a lognormal probability density function 
of the random variable X was given in Chapter 2. 

f M = _J 7=e - <l0S3C - m)2/2a2 
xoVln 

where m and a are the standard deviation of log (X).- The rth moment of X is 

rn r (X) = E(X r ) = e nr ^ rln (6.83) 
The mean is 

m x = e m+ff2/2 ■ • (6.84) 

and the variance is 



(6.85) 



196 System Reliability Modelling and Evaluation 



Solving Equations (6.84) and (6.85) 

i2 



a 2 = lot 



and 



+ 1 



m =. log m x — \ log 



+ 1 



The parameters m and o can therefore be found from the mean m and standard 
deviation ox of the log normal distribution. The hazard rate of a lognormal 
distribution shows initial positive ageing followed by negative ageing. 

This suggests two combinations: 

(a) two series stages in parallel 

(b) series stages in series with two parallel stages 

The log normal is approximated by these two combinations using moment 
matching technique and the parameters are listed in Table 6.1 . The transition 
rate functions, probability density functions and the survivor functions of the 
log normal distributions together with the stage combination models are 
shown in Fig. 6.14 



Table 6.1 The parameters of stage combinations for the approximation of 
lognormal distributions 

The mean of the distribution m x = 20 hours, o x - Standard deviation of the 
distribution 

(a) Approximation by series stages in series with two parallel stages 



Lognormal Approximate model parameters 

o x a oji p Pi Pi 

(hours) per hour per hour per hour 

10 6 0-10976 0-526715 0-078325 0-123519 

14-14 4 0-05270 0-554433 0-036597 0-083496 

20 3 0-02250 1-221384 0-016180 0-060515 



(b) Approximation by two series stages in parallel 

a x a x a 2 Pi Pi 

(hours) per hour per hour 

10 4 6 0-21601 0-144001 0-336001 

14-14 2 4 0-25464 0-066666 0-241201 

20 2 2 0-06699 0-031698 0-118301 



197 




1 98 System Reliability Modelling and Evaluation 



Several techniques exist for finding time specific probabilities of the states 
of Markovian process. The probability for the down state was evaluated up to 
24 hours assuming the system was initially in the up state. The continuous time 
Markov process is approximated by a discrete time process using a small time 
interval and the state probabilities are obtained by multiplication of the 
transition probability matrix., The results are shown in Table 6.2. The results 
compare well with the direct approximate expression derived by assuming that 
the time period considered is so short that no more than one forced outage 
and repair can occur in it. Denoting the up and down states by 1 and 2 
respectively, and assuming p (0) = 1 

p 2 (t) — p°(t) = P(one forced outage and no repair in t) 

= f \Q~ Kx S d (t-x)dx 
, Jo 
where 

X = failure rate 
S d — survivor function of down time 

and 

Pi(t) ~ the approximate expression for the probability of being in the 
down state at time t 

Assuming the interval (0, t) to be divided into k equal subintervals each of 
length 5 

k 

P i(f) = Z [^(forced outage in ith interval)! 5 ^ repair in interval of 
i=l length (fc- /)5] 



I [e-™- m -e- Xi6 ]S d {(k-08} 



when Xf < 1 



= 1 - X* 



Therefore 



pS(0= ™ £ S d {(k-i)5} 

1=1 



- X5 £ S d {(i-l)8} 

- X6 S d {i8) 

i=0 

It may be necessary to find the state probability values at the end of each 
equal time interval, i.e. in a 24 hr period, calculate p% 0) at the end of each 



Reliability Modelling in Non-Markovian Systems 199 



hour. The probability at the end of the nth interval is given by 



p%{nm8) = p a 2 {(n~\)m8} + \8 £ S d (i8) 

i=(n-l )m 

where m is the number of subintervals in each time interval and 8 is the length 
of each subinterval, the length of each interval therefore being m8. The closed 
form expression for the survivor function of the lognormal distribution does not 
exist and Simpson's Rule can be applied to calculate this by integrating the 
probability density function. The following equation can be used to calculate 
S d (i8) succesively 



S d (i8) = S{(i-l)8}-~- X 
61 j= 0 



/ (z~ l)6 + 



+ 4/(7-1)5 ■ + 



+ +(/+!) 



where / is the number of divisions in each subinterval for calculation of the 
survivor function by Simpson's Rule. This method was developed for computer 
application without requiring excessive storage. Down state probabilities are 
evaluated up to 24 hours with a one hour interval. They are listed in Table 6.2 
together with those obtained by the device of stages. It can be seen that the 
results of the two approximate methods are quite close to each other. 



Application to The Transformer Bank Problem 

Reliability Modelling for Three Single Phase Transformers With One Spare, 
General distributions for Repair and Change Out, Restricted Repair, Using 
Stages in Parallel. 

The expressions for this model assuming general distributions have already 
been developed. However, to illustrate the modelling approach when the 
stages are in parallel, the state transition diagram is developed in Fig. 6.15. 
Since the change out operation is restricted to a single stage, from the point of 
view of steady state analysis, the change out rate can be simply represented by 
an average value 7. 

The probability density function for the repair time is 

& i Hi e _MlX + CO2M2 e~^ x with mean = coi/jux +co 2 /j"2 (6.86) 



The state transition diagram in Fig. 6.15 can be clarified by referring to the 
discussion related to Fig. 6.7. The first letter of the state numbers represents 



200 



■ , </) 

is 

O m 



» o 
So 

2 ri 



* s 

Is 

■ M H 

x - • 

« 13 

to ^ 

■"S S 

.5 a 



t3 ,-, 

%3 



ooooo ooooo ooooo ooooo oooo 



X X X X X 

Tfr to VO O CN O ^ ^5 M M 

lO CN VO OS OS r— fN ^- -H 

io U\ io H c-- ro oo co oo 

^HH(Sm rn ^- «o «0 



X X XX X X X X X X 



■HJ- OvOS *3" «o 

io so m ts so 
so \o t^ r-- 



XXXXX XXXX 

00 O O VO WO 00 VO 00' 00 

xoomn vi r-- oo r-~ wo 

omvoo\H ,oo io os 

ob ob ob ob os ds os os os 



ooooo ooooo ooooo ooooo oooo 



XXXXX XXXXX XXXXX 

oo-3-^hoo fsi ws oo rs o h os «n o\ m 

oo ro r- os r- m irt h ^ \o h m oo 

vo ro as wo -h r-rtf>N* p oo -h »4- 

vb^'-HCNcO rOTfTj-tovS vb vb vb c~- 



XXXXX XXXX 

os r- oo »o <n vo vo oo 

oo r- — h o c-- co oo t> 

r- p ro vp oo p <n vp 

. r^- ob oo ob ob ds os Os os 



OOOOO OOOOO OOOOO OOOOO oooo 



XXXXX XXXXX 



m O O *-h <N O t-- to 
ro oo m oo h vo oo * N 

VO CO OS VO CXI 00 CO 00 rO 00 <N VO O CO VO 



X XXXX 
O Os <S O vo 
wo (S r- os 



X X XX X XXXX 



O CO wo O 00 vo vo in oo 
ONviooO rs ^ vo oo 



S o 

K6 



VO ^-1 -H 


<N CO 


ro vo wo 


vb 


vb t> 


ob ob ob ob Os 


Os Os Os Os 
















O O O 


o o 


o b b b o 


o 


a> o <o 'o 




a> a> h <=> 


XXX 


X X 


XXXXX 


X 


XXXX 


XXXXX 


XXXX 


O <N vo 
r- ro Os 

VO CO OS 


T-H C— 

V) OS 

vp r -i 


O VO OS 00 OS 
ro Tfr ro O -* 
OS W0 T-t (N 


vo 
<> 


ro ^ -rr ro 

"* OS t-H O 

<s vp h wo 


rs ro oo vo co 
so os os r~ O 

00 T-H |> p 


vo oo r- io 
t voce 
OOOO 


VO -H -H 


eS ro 


ro ■ tJ- vo to vo 


vo 


r~- i~- ob ob 


coosososM 


















O OO 


o o. 


OOOOO 


O 


b b b o 


<=> 'a 'o lo> a> 


'a C) 'o <d 


XX X 


X X 


XXXXX 


X 


X XX X 


XXXXX 


XXXX 


-Si" 00 

t- to os 

Vp to OS 


00 oo 
no O 
vp to 


ro r- t- Os o 
^ vo O 


O 


oo Tt o t-~ 
o ^ m 
N vo o 


VO OS 00 -rf to 

r~- os os os 
r-- p CO vp OS 


o oo oo t— 
r-i ro uo 

q> q> ^ 


VO *-H H 


<N CO 


ro uo io vo 


vb 


t> ob ob 


00 Os Os OS OS 


















boo 


b o 


c> a> <z> <z> h: 


o 


oooo 


O O O O O 


2 2 lk 2 


X XX 


X X 


XXXXX 


X 


XXXX 


XXXXX 


XXXX 


r- ro.O 
vo ro O 
VO CO O 


O 

O t-H 
P 


o c-- o vo oo 

invOVOMvO 

OS WO t-H r» (N 


IT) 

VO 

tr;- 


OO O c4 in 
00 00 OS t- 
<N vp p 


O O vo t-h 
m vo vo o 

OO th rt O 


v-» t— r- t-o 

(SI rf VO 00 
OOOO 


VOHN 


(N ro 


ro w vi vo 


vb 


r- o oo oo 


00 Os Os Os t— h 





OOOOO OOOOO OOOOO OOOOO OOOO 



XXXXX xxxxx 

O O VO 00 VOVO0OU0VO 

voroovorN co ro. Os Os 

voroOvoto os vo N oo Tt 

vo -H(S(s|fn rj- >n uo b 



xxxxx xxxxx XXXX 

<NOC--t-hO CO fl h ol O vOOsOsC- 

r-rsrorNit— xvorviooo o m vo 

O Vp H O O ^COOpp .-HT-Hr-HT^ 

t^- C^- X OO OS OsOst-Ht-Ht-h ^h,_it^ 



O O 


o o 


o 


ooooo 


o o 


O O O 


OOOOO 


oooo 


X X 


X X 


x 


XXX XX 


X X 


X XX 


XX X XX 


XXX X 


r- ^l- 
vo ro 
vp CO 


O vo 
O 00 

p 


00 
ro 

CO 


00 O o ro in 
00 -0- OO 00 Os 
OS vp C4 OS iO. 


i—l 00 

oo rs 
p vp 


CO vo CO 
1- M VO 
tHVDO 


r- <st os o co 
c~- uo t-h in t-~ 

tJ-COOOO 


Tt" so VO wo 
Ors>^vo 


vb rll 




CO 


ro ^j- in wo vb 




00 00 OS 


OS OS T-H t-H t-H 




















o b 


b o 


o 


O) '<o a> a> 'o 


b b 


o o o 


ooooo 


oooo 


X X 


XX 


X 


XXXXX 


X X 


XXX 


xxxxx 


XXXX 


C-- ro 

VO CO 

vp ro 


O 00 

o o 
oo 


o 
o 

00 


O O O 00 O 

o in o cs vo 
p vp oo qs tr> 


o o 

t-H O 
H O 


00 t-h OS 

r- 00 O 

HVOH 


MhvivDIO 
MOMiflOO 

in os o o O 


t-H Tj- V) -sl- 

t-H CO V) C— 


vb t-h 


<N (N 


ro 


ro ^ uo no vb 




ob ob oV 






t-, n 


CO rj- 


uo 


vo r- oo os o 


T^ <N 


CO in 


VO C— 00 OS o 

tHtHHHM 


H <N (O H- 

o) (S (S M 




Fig. 6.15 The state transition diagram for three single phase transformers 
with one spare, assuming restricted repair and two stages in 
parallel for the repair time. 



the s>stem state as given in Fig. 6.2, and the second letter if any denotes the 
stage number. The steady state equations for this model can be written in the 
conventional manner. 

The solution of these equations gives 



where 



also 



p! =■ 1/Z where Z = + - 
E \7 n 

£ _ H — — and — — 1 

3X + M! 3X + //J ii i± x i± 2 

I E 3X 
P2 = P21 +P22 = — Pi = -J and p 4 =-p 41 +p 4 2 

ZE yEZ 

9j 2 ( coi oj 2 



Three Single Phase Transformers With a Single Spare, Unrestricted Repair, 
Erlangian Distribution for Repair and Reinstallation, 

The state transition diagram shown in Fig. 6.16, assumes three stages for 
each of the repair and reinstallation periods, i.e. the index V in Equation 
(6.68) is 3. The transition diagrams for higher V can be similarly developed and 
the numerical answers when a is a non-integer, (i.e. when Equation (6.69) is 
assumed as the distribution) can be obtained by interpolation. A computer 
program can be written which can generate the transition rate matrix for any 

without having to draw the state transition diagram. The reliability 
evaluation can, therefore, be made conveniently for any value of V. 



202 System Reliability Modelling and Evaluation 




The repair and the change out rates associated with each stage are assumed as 
p and j3 respectively such that p = a.fi and jS'= a.y. The first letter of the state 
number in Fig. 6.16 indicates the system state as shown in Fig. 6.1 and the 
remaining letters denote the stages of the operations being carried out. For 
example, 22 means that the system is in state 2 (the transformer bank is up and 
no spare) and the repair is in the second stage. Similarly, 412 indicates state 4 
(transformer bank failed and no spare) and the repair on one of the 
transformers is in the first stage and the other transformer in the second stage. 
As another example, 323 denotes state 3 (transformer bank down and one 
spare) and the repair is in the second stage whereas the change out operation is 
in stage 3. The relationship between the state and stage availabilities is 



Pi = Pi, P2 = £ P2i, P3 = £ £ Piu> 
i=l i=l j=l 



P4 = £ £ P4ij, PS = £ Psi 
i=l /=! z=l 



Reliability Modelling in Non-Markovian Systems 203 



The steady state equations for Fig. 6,16 can be written in the form 

AP = B (6.87) 

where 

B = A column vector with all zeros 

P = the vector of steady state probabilities 

and 

A = the transpose of the transition rate matrix 

Any (a -I) equations out of a in (6.87) can be solved with the normalizing 
equation 

i Pi = i 

i=l 

to give the steady state probabilities. 
Numerical Results 

These studies were conducted to determine the effect of the distribution form 
on the steady state probabilities. The exponential distribution was assumed for 
the up time and the repair and change out periods were assumed to have the 
Special Erlangian distribution of Equation (6.68) with the same shape 
parameter V. 

Restricted Repair 

The failure rate was taken as 0-008 failures per year and the effect of 
variation of V on the probability of being in the down state was determined 
under different values for the mean repair and change out times using Equation 
(6.30). The results are shown in Tables 6.3 and 6.4. It should be noted that 
the mean values are held constant so that the difference in values is entirely due 
to the change in the shape of the distribution. The values with a = l correspond 
to the exponential distribution and the limiting values (LV) refer to the 
constant repair and reinstallation times i.e. when a- ► °° with the mean values 
kept constant. 

Table 6.3 shows the actual values of the steady state unavailability and Table 
6.4 shows the percentage variation from the exponential when the value of V 
is increased. It can be seen that the value of V has a considerable effect on the 
unavailability of the transformer bank. The variation from the exponential 
depends both upon the mean repair time (MRT) and the mean change out time 
(MIT). For a given mean change out time, the greater the mean repair time the 
more pronounced is the variation from the exponential and for a given mean 
repair time, the greater the mean change out time the less pronounced the 
variation. 



204 System Reliability Modelling and Evaluation 



Table 6.3 The effect of a on the unavailability of the transformer bank 



M.R.T. =182-5 Days 



M.R.T. = 20 Days 



MJ.T. = 0-5 
(Days) 



MJ.T. =3-5 
(Days) 



MJ.T. = 0-5 
(Days) 



M.I.T. = 3-5 
(Days) 



1 

2 
3 
4 
5 

LV 



0-175139 X 10 " 3 
0-140000 X 10" 3 
0-128224 X 10 3 
0-122325 X 10" 3 
0-118781 X lO" 3 
0-104546 X 10~ 3 



0-372291 X 10~ 3 
0-337166 X 10" 3 
0-325395 X 10~ 3 
0-319498 X 10" 3 
0-315956 X 10" 3 
0-301726 X 10" 3 



0-346026 X 10" 4 
0-341714 X 16 ~* 
0-340276 X 1Q" 4 
0-339557 X 10~ 4 
0-339126 X 10" 4 
0-337212 X 10~ 4 



0-231810 X 10" 3 
0-231379 X 10" 3 
0-231236 X 10" 3 
0-231164 X 10~ 3 
0-231121 X 10" 3 
0-230929 X 10" 3 



Table 6.4 The percent variation from the values of exponential 
a M.R.T, = 182-5 Days M.R.T. = 20 Days 

M.I. Z = 0-5 MX T. = 3-5 M.I. T. = 0-5 M.I. T.=3-5 





(Days) 


(Days) 


(Days) 


(Days) 


1 


0-00 


0-00 


0-00 


000 


2 


20-06 


9-43 


1-25 


0-18 


3 


26-79 


12-60 


1-66 


0-25 


4 


30-16 


14-18 


1-87 


0-28 


5 


32-18 


15-13 


1-99 


030 


LV 


40-31 


18-95 


2-55 


0-38 



Unrestricted Repair 

A similar study was conducted for unrestricted repair using the computer 
program which generates the transition rate matrix for different values of V 
and evaluates the'various steady state probabilities. The variation of the 
unavailability with the increase in is shown in Table 6.5 A and similar 
variations in the states constituting the failure state, i.e. p 3 , p 4 and p 5 are 
shown in Tables 6.5B — D. The probability density functions for both repair 
and change out are assumed to be Special Erlangian. 

The exponential distribution, as in the case of restricted repair, does 
overestimate the unavailability but the variation is insignificant. The individual 
components of the failure state show an interesting behaviour; p 3 shows an 
increase with increasing V whereas p 4 decreases. The only component which 
shows a large variation is p 5 , however, its magnitude is relatively quite small. 
The large variation in the value of p 5 is explained by the fact that an increase 
in V is accompanied by a rapid decrease of dispersion of the repair and change 



Reliability Modelling in Non-Markovian Systems 205 



out times which results in a lower probability of being repaired while the change 
out is in progress. The overall variation in the unavailability is, however, quite 
small. 

Two Three Phase Transformers in Parallel, Special Erlangian Distribution for 
Repair and Change Out Periods 

The combination of stages to approximate the ; probability density function 
depends upon the available information. However, a model has been developed 

Table 6.5 The variation in the probability of the failure state and its 
constituent states 



a M.R.T. =182-5 Days M.R.T. = 20 Days 

MJ.T. = 0-5 MJ.T. - 3-5 MJ.T. = 0-5 MJ.T. = 3-5 

(Days) (Days) (Days) (Days) 



A. Variation in Unavailability 



1 


0-103818 X 


10 


-3 


0-299854 X 


10 


-3 


0-337181 X 


10' 


-4 


0-230819 X 10' 


-3 


2 


0-103769 X 


10 


-3 


0-299513 X 


10 


-3 


0-337126 X 


10' 




0-230780 X 10 


-3 


3 


0-103745 X 


10 


-3 


0-299344 X 


10 


-3 


0-337099 X 


10 


-4 


0-230760 X 10' 


-3 


4 


0-103731 X 


10 


-3 


0-299237 X 


10 


-3 


0-337081 X 


10' 


-4 


0-230748 X 10' 


-3 


5 


0-103720 X 


10 


-3 


0-299164 X 


10 


-3 


0-337068 X 


10 


-4 


0-230739 X 10 


~3 


B. Variation inp 3 






















1 


0-327835 X 


10 


-4 


0-225739 X 


10 


-3 


0-320738 X 


10" 


-4 


0-195816 X 10 


-3 


2 


0-328720 X 


10 


-4 


0-229870 X 


io- 


-3 


0-328363 X 


10 


-4 


0-220615 X 10' 


-3 


3 


0-328726 X 


10 


-4 


0-230027 X 


10 


-3 


0-328726 X 


10 


-4 


0-226905 X 10 


-3 


4 


0-328726 X 


10 


-4 


0-230037 X 


10 


-3 


0-328748 X 


10 


-4 


0-228900 X 10 


-3 


5 


0-328726 X 


10 


-4 


0-230039 X 


10 


-3 


0-328749 X 


10 


-4 


0-229609 X 10 


-3 


C. Variation in p. 






















1 


0-709445 X 


10 


-4 


0-697866 X 


10 


-4 


0-842477 X 


10 


-6 


0-734782 X 10 


-6 


2 


0-708962 X 


10 


-4 


0-694447 X 


10 


-4 


0-836972 X 


10 


-6 


0-695249 X 10 


-6 


3 


0-708722 X 


10 


-4 


0-692747 X 


10 


-4 


0-834234 X 


10 


-6 


0-674866 X 10 


-6 


4 


0-708574 X 


10 


-4 


0-691686 X 


10 


-4 


0-832526 X 


10 


-6 


0-662111 X 10 


-6 


5 


0-708467 X 


10 


-4 


0-690945 X 


10 


-4 


0-831331 X 


10 


-6 


0-653233 X 10 


-6 


D. Variation inp. 






















1 


0-898177 X 


10 


-7 


0-432924 X 


10 


-5 


0-801844 X 


10 


-6 


0-342618 X 10 


-4 


2 


0-128310 X 


10 


-8 


0-197985 X 


10 


-6 


0-393765 X 


10 


-1 


0-846884 X 10 


-s 


3 


0-713079 X 


10 


-9 


0-415356 X 


10 


-7 


0-301219 X 


10 


-8 


0-318000 X 10 


-s 


4 


0-665219 X 


10 


-9 


0-323045 X 


10 


-7 


0-808461 X 


10 


-9 


0-118599 X 10 


-5 


5 


0-638378 X 


10 


-9 


0-305947 X 


10 


-7 


0-636642 X 


io- 


-9 


0477498 X 10 


-S 



in Fig. 6.17, assuming the Special Erlangian distribution for repair and change 
out because by suitably varying the parameter V'the behaviour of a large number 
of probability density functions less dispersed than the exponential can be 
approximated. The same symbols and notation as in Fig. 6.16 has been used. 
The transition diagram shown is for three stages. A computer program can be 



206 System Reliability Modelling and Evaluation 



easily written to generate the transition rate matrix for any value of V. The 
numerical values for time dependent or steady state probabilities can then be 
obtained. 




Fig. 6.17 The state transition diagram for two three phase half capacity 
transformers in parallel assuming the special Erlangian 
distribution for repair and change out times. 



The Special Case of Independent Components 

If a system is composed of independent binary components, the associated 
stochastic process is a superposition of independent alternating renewal 
processes. It was proved in Chapter 3 for an equilibrium alternating renewal 
process that irrespective of the probability density function of the up and down 
time durations, the frequency of encountering the up and down states is given by 

1 

fu~fd ~ 



It is also known that for an equilibrium alternating renewal process, the 
probabilities of being in the up and down state respectively are given by 



Pu 

and 

Pd 



T u + T d 
T„ + 



The transition rates from up to down, A, and down to up, /i, are therefore, 
given by 

X = f d /P u = ^ and ju = f u /P d = -L 



Reliability Modelling in Non-Markovian Systems 207 

It is therefore clear that under steady state conditions, the interstate transition 
rates of an alternating renewal process can be represented by the reciprocals of 
the respective mean state durations. Since the different alternating renewal 
processes are independent, the steady state probabilities and frequencies will be 
uneffected by the forms of the probability density functions of the state 
durations provided the transition rates are represented by the mean component 
state durations. 



Reliability Modelling Using Complex Transition Rates 

It has been noted that when complex transition rates are allowed, any 
distribution having a rational Laplace transform can, in principle, be treated 
using the state device. The use of complex transition rates is illustrated for a 
component whose up time is exponentially distributed with rate parameter 
X and down time has the density function 

f(x) = a (a \ b K ~ at (I- Cos bt) (6.88) 



The Laplace transform is 

t a 2 + b 2 



a + s(a + s¥ + b 2 
a a + ib a — ib 



a + s(a + ib) + s (a— ib) + s 
This expression is the product of the Laplace of three functions as shown below 
Laplace function 

a + s 



a + ib 



(a + ib) +s 

a — ib 
(a - ib) + s 



(a + ib) e ^ a+ib ^ 
(a-ib)e- (a - ib)t 



Equation (6.88), therefore, is the probability density function of the random 
variable which is the sum of three random variables having exponential 
distributions. The state transition diagram of this component is, therefore, as 



208 System Reliability Modelling and Evaluation 




Fig. 6.18 State transition diagram of a component having an exponential 
up time distribution and a down time distribution given by 
Equation (6.88) 



shown in Fig. 6.18. The state differential equations can be written as below 

p'o{t)^ ~~Xp 0 (t) + (a-ib)p 3 (t) 
P'i(t) - -a Pl (t) + \p 0 (t) 
P2 (0 = - (a + ib)p 2 (t) + ap x (t) 
p' 3 (t) = - (a - ib)p 3 (t) + (a + ib)p 2 (t) 

Assuming p 0 (0) = 1, the Laplace transforms of the above equations are 



s P o(s)-\ = ~~\Po(s) + (a-ib)p 3 (s) (6.89) 

spi(s) = -api(s) + \p 0 (s) . (6.90) 

sp 2 (s) = ~(a + ib)p 2 (s) + a Pl (s) (6.91) 

sp 3 (s) f ~-(a-ib)p 3 (s) + (a + ib)p 2 (s) (6.92) 

From Equations (6.90H6.92) 

Pl (s) =~~p 0 (s) (6.93) 



s + a + ib * 

_ 

(s + a)(s + a + ib) Po{S) 
a.+ ib aX 



Reliability Modelling in Non-Markovian Systems 209 
Substituting (6.95) into (6.89) 

(s + \)p 0 (s) = l+(a~ib)p 3 (s) 

_ (a-ib)(a + ib)aX 

(s + a- ib)(s + a + ib)(s + af oK 



' (a 2 + b 2 )aX 

(s + X) 



(s + a)((s + a) 2 + b 2 ) 
Therefore 



Po(s) =1 



(s+a)((s+a) 2 + b 2 ) 

Po(s) = 



s{[(s + af + b 2 } [s + a + X] + aX(s + 2a)} 

X((s + a) 2 + b 2 ) 
Pl{S) s{[(s + af + b 2 ] [s+a + X] + aX(s + 2a)} 

a\(s+a — ib) 

P2{S) " s{[(s + a) 2 + b 2 ] [s + a + X] + a\(s + 2a)} 

a\(a + ib) 



P3 ^ s{[(s + a) 2 + b 2 ][s + a + X]+aX(s + 2a)} 
p DN (s) - Pi(s) + p 2 (s) + p 3 (s) 

_ X((s + a) 2 + b 2 ) + aX(s + g - #) + aX(g + /6) 
*{•} 

X((s + a) 2 + b 2 ) + aX(s + 2a) 
~ s{[(s + a) 2 + b 2 ] [s + a + \] + aX(s + 2a)} 

It can be seen that p 2 (s) and p 3 (s) when inverted will yield complex probabilities 
but p DN (t) will be real. 

Steady State 

PdnO = sp DN (s) 

X(g 2 + £ 2 ) + 2a 2 X 
" (a 2 + 6 2 )(<z + X) + 20 2 X 

" X(a 2 + b 2 +2a 2 ) + a(a 2 +b 2 ) 

- X 
" , , a(a 2 +b 2 ) 
X+ b 2 + 3a 2 



2 1 0 System Reliability Modelling and Evaluation 

- X 
~'\ + H 

a(a 2 +b 2 ) 

1 

It can be proved that ju = — 
where jT d = The mean down time. 



References 

1. D.R. Cox, The Analysis of non-Markovian Stochastic Processes by the 
Inclusion of Supplementary Variables, Proc. Camb. Phil. Soc, 61 , pp. 433— 
41 (1955). 

2. D.R. Cox, A Use of Complex Probabilities in the Theory of Stochastic 
Processes, Proc. Camb. Phil. Soc, 61, pp. 313-19 (1955). 

3. Sheldon Ross, Applied Probability Models with Optimization Applications, 
Holden-Day, San Francisco (1970). 

4. C. Singh, R. Billinton, Reliability Modelling in Systems with non-Exponential 
Down Time Distributions, IEEE Trans., PAS-92, No. 2, pp. 790-800 
(March/ April, 1973). 

5. C. Singh, R. Billinton and S.Y. Lee, Reliability Modelling Using the Device of 
Stages , PICA Proceedings (1973). 

6. C. Singh, 'Reliability Modelling and Evaluation in Electric Power Systems,' 
Ph.D. Thesis, University of Saskatchewan, Saskatoon, Canada (1972). 

7. Electric Power Institute of Texas A & M University, Methods of Bulk Power 
System Security Assessment (Probability Approach), Edison Electric 
Institute Project R P 90-6 (Nov. 1970). 



CHAPTER 7 

Simulation 



Introduction 

The previous chapters were concerned with formulating and solving mathematical 
models for system reliability. The solution may be either in the form of an 
explicit expression or the results may be obtained by numerical methods. This 
type of approach portrays the cause and effect relationships in the physical 
system and enhances the insight. Methods have been described both for 
Markovian and non-Markovian systems. It is obvious, however, that problems 
involving non-exponential distributions can become very complicated. The 
analytical approach is efficient and should always be employed when it is 
possible to develop a model which is a reasonable representation of the physical 
system and also when such a model is amenable to solution. Some problems 
are, however, too complex to be solved in this manner and simulation 
techniques have to be used. 

In simulation, the system is divided into elements whose behaviour can be 
predicted either deterministically or by probability distributions. These elements 
are then combined to determine the system reliability. Simulation, therefore, 
also employs a mathematical model but it proceeds by performing sampling 
experiments on this mathematical model. Simulation experiments are virtually 
the same as ordinary statistical experiments except that they are performed on 
the mathematical model rather than on the actual system. 

Simulation is an imprecise technique by virtue of its very statistical nature. 
Mathematical methods discussed in the previous chapters generally give exact 
results under the assumptions made. Simulation techniques, however, provide 
only the estimates of the exact results. Moreover they provide only a numerical 
value and to obtain another numerical value for a different set of parameters, 
the whole simulation experiment may have to be repeated. Sensitivity analysis, 
using a simulation approach is therefore quite expensive. It is,, however, a very 
flexible approach and for many problems may be the only answer. 

It should be noted this book deals only with digital simulation. It is called 
digital because most often it is executed on the digital computer but there is 
no inherent relationship between the two. It is a vast field and a separate book is 
needed to do justice to it. This book discusses the basics on which the reader 
can build further simulation programs. 



212 System Reliability Modelling and Evaluation 
Basic Procedure 

It has been previously noted that simulation experiments are similar to ordinary 
statistical experiments except that they use a mathematical model of the 
system rather than the physical system itself. This is illustrated with the help 
of an example of two independent components in parallel. The system is failed 
when both the components are failed. The system could be constructed and 
operated for a long time. The history of operation and failure of the system 
could be recorded and the different reliability measures obtained using 
statistical methods. Such a method would be very expensive, especially where 
costly equipment is involved and would require a long time before any 
statement could be made about its reliability. The simulation of this system is 
conducted by making a mathematical model where the behaviour of the 
components are represented by probability distributions. Assume that component 
1 is in the up state at the beginning of the experiment. Using a random number 
and the probability distribution of the up time of component 1 , the time at 
which this component will fail is determined. Methods for doing this are 
explained later in the chapter. In a similar manner a possible duration of the 
repair time is generated. A history of the component generated in this 
manner is one possible realization of the stochastic process. The realization of 
component 2 is also constructed and the overlapping outage durations 
represent the durations of the system failure. A number of realizations of the 
system history can be constructed in this manner and the reliability measures 
obtained from these realizations using statistical methods. 

In essence, simulation consists of constructing realizations of the stochastic 
process underlying the system and then extracting the required system 
performance parameters from these realizations. Most of the refinements in the 
theory of simulation are concerned either with developing more efficient 
methods of constructing realizations or extracting the information from the 
least possible number of realizations. 

Random Number Generation 

Random numbers are needed to generate random observations from the 
probability distributions. Tables of random numbers have been generated using 
mechanical or electronic devices. The basic requirement for the numbers to be 
random is that in a sequence each number should have equal probability of 
taking on any one of the possible values and it must be statistically independent 
of the other numbers in the sequence. While executing simulation on a digital 
computer, the table of random numbers can be provided externally. It is, 
however, more common to have the computer generate its own random 
numbers. There are several good methods available and only one is described 
here. It is a multiplicative congruential method and obtains the («+l)th random 



Simulation 213 



number R n+ i from the nth random number R n by using the following 
recurrence relation due to Lehmer 

R n+l - (aR„) (modulo m) 

where a and m are positive integers, a <m. The above notation signifies that 
R n+ l is the remainder when (aR^)ls divided by m. The first random number 
R 0 is assumed and the subsequent random numbers can be generated by this 
recurrence relation. The sequence of random numbers so generated is periodic. 
Great care has to be exercised in the selection of a combination of R Q , a and m. 
The sequence cycle should be larger than the number of random numbers 
required. One combination which is satisfactory is 

a = 455 470 314 

m = 2 31 - 1 = 2 147 483 647 

R = Any integer between 1 and 2 147 483 646. 

Now if random numbers between say 0 and 999 are required, then the computer 
can be instructed to take the last three digits of the random number so 
generated. It can be seen that the sequence of random numbers so produced 
is predictable and reproducible and is not therefore strictly a sequence of 
random numbers. For this reason these random numbers are called pseudo- 
random numbers. They can, however, satisfactorily play the role of random 
numbers in digital simulation. In fact in many applications where alternative 
design configurations are being evaluated, the use of the same sequence of 
random numbers may be desirable. 

Simulation Model 

A simulation model representing the system to be simulated is required. The 
analyst should become thoroughly familiar with the system as in the ease of 
mathematical modelling. In complex systems, failure modes and effects 
analysis is quite useful in gaining an insight into the system behaviour. The 
system is broken into elements whose behaviour can be predicted either in a ' 
deterministic manner or in the form of probability distributions. In reliability 
evaluation, continuous probability distributions are most often used. When 
historical data is available, either the frequency distribution of these data or 
the probability distribution which best fits these data may be used. The latter 
alternative is, however, more satisfactory as it comes closer to predicting the 
expected future performance rather than repeating the idiosyncrasies of the 
recorded data. 

The operating rules which define the effect of the elements on each other and 
on the system should be specified. These rules may be either probability 
distributions, tables or some set of rules. It may be preferable to draw logical 



214 System Reliability Modelling and Evaluation 



flow diagrams of the system specifying the rules and logical linkages. The 
tendency to be over-realistic at the expense of simplicity should be guarded 
against. 

Timing Controls 

Simulation studies deal with the passage of time. There is no connection between 
the simulated time which represents the passage of time in the actual world and 
the computational time. There are two methods of representing time in 
computer simulation programs: 

i. fixed time interval method 

ii. next event method 

A brief description of each is given below. 
Fixed Time Interval Method 

This is also called the synchronous timing method. This is a two step method. The 
basic time interval is At which may be microseconds, minutes or days. The 
interval length At will be chosen depending upon the operating characteristics 
of the system. Starting in the initial state, time is advanced by At and the 
program then looks to see if an event has occurred. The system is then up dated 
by determining the resulting state of system. If no event has occurred then the 
system stays in the same state. These two steps may be repeated as many times as 
desired. 

Next Event Method 

This is also called the asynchronous timing method. Simulated time, in this 
method, is advanced by a variable amount rather than a fixed amount each 
time. The computer proceeds by keeping a record of the next few simulated 
events scheduled to occur. The most imminent event is assumed to have occurred 
and the simulated time is advanced to the point of occurrence of the event. The 
cycle is repeated as many times as desired. 

In essence, in the synchronous timing method, the time is advanced by 
definite amounts and every time the system is updated by determining the event 
that occurred during this interval and in the asynchronous timing method, the 
next event is determined and the time is advanced to the occurrence of this 
event. The occurrence of an event during an interval or the time to the 
occurrence of an event is determined using the following sampling techniques. 

Random Sampling 

When all the elements operate and interact in a deterministic manner, the 



Simulation 215 



event occurring during an interval or the time till the next event, is easily 
determined. In systems with stochastic elements, these random observations 
are obtained from the probability distributions using random numbers and 
methods of generating random observations from probability distributions are 
required. 

Discrete Distributions 

Two methods for modelling discrete distributions are described below. 

1 Proportionate Allocation Technique 

It consists in allocating the possible values of the random number to the 
various values of the random variable underlying the distribution in direct 
proportion to their respective probabilities. A random number is selected and 
the corresponding value of the random variable is the random observation. 
The method is quite useful in simulating discrete time Markov chains when the 
transition probability matrix is specified. This method is illustrated by the 
example of man who if he does his exercises one day, is 70% sure not to do 
them next day. On the other hand, if he does not do his exercises one day, is 
60% sure not to do tkem the next day. Denoting the doing and not doing of 
the exercises by 0 and 1 respectively, the transition probability matrix is 

Final state 
Initial state 0 1 

0 0-3 0-7 

1 0-4 0-6 

The realizations are constructed using a table of random digits. If the man 
is in state 0, select a single random digit and the next state is determined as 
follows 

digit event 
0 — 2 stay in 0 
3-9 transit to 1 

Similarly if the man is in state 1 

digit event 
0-3 transit to 0 
4 — 9 stay in 1 

The construction of a realization for ten days is shown in Table 1 . It is 
assumed that the man does his exercises on the first day. It should be noted 
that this is only one possible realization of the stochastic process. The stochastic 
nature is evident from the sequence of states occupied. For deriving probabilites, 



21 6 System Reliability Modelling and Evaluation 



a number of such realizations have to be constructed. The methods for deriving 
measures from these realizations is described later in the chapter. 



Table 7 . 1 Constructing a R ealiza tion 



Day Random number State 

1 0 

2 4 1 

3 3 0 

4 4 1 

5 1 0 

6 2 0 

7 2 0 

8 2 0 

9 2 0 

10 0 o 



2 Inverse of the Probability Distribution Method 

This method is practically the same as method 1 but is a little more involved 
and proceeds in the following steps: 

1 . Construct the distribution function of the random variable X, i.e. F(x) - 
F{X<x):The distribution function has the property that is monotonically 
increasing. The probability mass function of an arbitrary random variable X 
and its distribution function are shown in Fig. 7.1 . 



Random number 
= 0-55 



Fix) 
P(xSx) 




Random observation 
1/ ■ , 



Fig. 



0 1 2 3 4 0 1 2 

7.1 Probability mass function and probability distribution function 



Simulation 217 



2. Generate a random decimal number between 0 and 1 . This is achieved by 
obtaining a random integer with the desired digits and then placing a decimal 
point before it. 

3. Set F(x) equal to the random number and select the value of x 
corresponding to F(x). This value of x is the desired random observation from 
the probability distribution. 

It should be noticed that F(Xf) — F(x i . l ) is equal to P(X = Xj), and if the 
random number falls in the interval (F(x i ),F(x i - 1 )), the value of X — x t 
will be selected. The procedure therefore essentially allocates the random 
numbers to the random variables in the proportion of their probabilities of 
occurrence. The basic procedure is therefore the same as that of method 1 

Continuous Distributions 

The above procedure can also be used for continuous distributions. 
Continuous distributions are approximated by discrete distributions whose 
irregularly spaced points have equal probabilities. The accuracy can be increased 
by increasing the number of intervals into which (0,1) is divided. This requires 
additional data in the form of tables. Although the method is quite general, its 
disadvantages are the great amount of work required to develop tables and 
possible computer storage problems. The following analytic inversion approach 
is simpler. 

Let z be a random number in the range 0 to 1 with either a uniform 
probability density function or a triangular distribution function, i.e. 







Z<0 


f(z) = 


c 


0<Z< 1 






z>\ 


Similarly 








' 0 


Z<0 


F(z) = 


z 


0<Z< 1 




\ 1 


Z>1 



Let F(x) be the distribution function from which the random observations 
are to be generated. Let 

z - F(x) 

Solving the equation for x gives a random observation of X. That the 
observations so generated do have F(x) as the probability distribution can be 
shown as follows. 



218 Sys tern Reliability Modelling and Evaluation 



Let 0 be the inverse of F, then 

X = 0(z) 

Now x is the random observation generated. To determine its probability distri- 
bution 

P(x<X) = P(F(x)<F(X)) - P(z<F(X)) = F{X) 

Therefore the distribution function of x is F(x) as required. In the case of 
several important distributions, special techniques have been developed for 
efficient random sampling. A few cases are described in the following section. 
The reader can refer to Reference 5 for a more detailed treatment. 

Exponential Distribution 

The exponential distribution has the following probability distribution 

p(X<x) - \-e~ px 

where \ jp is the mean of the random variable X. Setting this function equal to a 
random decimal number between 0 and 1 

z = i - e -P* 

Since the complement of such a random number is also a random number the 
above equation can be written as 

z = e""* 

Taking the natural logarithm of both sides and simplifying 

x = Mz) 

-P 

which is the desired random observation from the exponential distribution 
having l/p as the mean. 

Erlang Distribution 

The above procedure can be readily extended to generate random 
observations from an Erlang distribution. It has been shown in Chapter 6 that 
the sum of V independent exponentially distributed random variables each with 
mean-i-has the Erlang distribution with mean-^and V as the shape parameter. 

Therefore if we have a sequence of V random decimal numbers in the interval 
(0,1), denoted by z l , z 2 , . . ., z Q , the random observation from Erlang 
distribution is ' 



Simulation 219 

f In (;,) 
ki P 

---Lfizi 



Normal Distribution 

The following describes a technique developed by Box and Muller. The 
method proceeds by generating pairs of normal deviates. The joint distribution of 
two independent standardized normal deviates is given by 

fxv^y) = -}-exp{^{(x 2 +y 2 )} (7.1) 

Consider the polar transformation 

x — u Cos v . 
y — u Sin v 

c. 

The inverse transformation is 
u = (x 2 + y 2 )^ 

and 

y 

v = arc tan — . 



The above functions can be written in a general form 
U = U(X, Y) V = V(X, Y) 

and 

X = X(U, V) Y = . Y(U, V) 

When X is near x and Y is near y t U and V must be near w and v. Therefore 
P(x<X<x +dx,y<Y<y +dy)=f XY (x,y)dxdy 

= 'P(u<U<u+du,v<V<v + dv) 
= fuv{u,v)dudv (7.2) 



Therefore 

f uv (u,v) = f XY (x,y) 



dx dy 
du dv 



Absolute values are used so that the expression is applicable for both non- 
increasing and non-decreasing functions. 



220 System R eliability Mod effing and Evaluatio n 

dxdy - u du dv 
Therefore 

f uv (u,v)dudv = —e'^ududv (7.4) 

27T 

= &- u2 d(W)~dv 
2n 

This expression can be interpreted in this manner : u 2 is exponentially distri- 
buted and z>h'as a uniform distribution in the interval (0, 2ix). Now if there are 
two random decimal numbers z 1; z 2 in (0, 1) 

u 2 = - In Zi 

i.e. 



u = \fJn\jzx 

and 

v = 2ttz 2 

Hence 

X— j In Cos 2nz 2 

and 

Y — I In — '■ Sin 2ttz 2 
V z t 

are exact independent normal deviates. 



This book covers only a few cases of analytic inversion. Many other methods 
exist for analytic inversion of particular probability distributions including 
inversion by graphical or tabular means. 



Estimating Reliability Measures 

Reliability measures can be calculated from the realizations using statistical 
methods. It is possible to construct probability distributions for the various 
residence times but usually the mean values are the main parameters. 

Time Specific Probability of X + 

UN observations of the state of system are made at time t, and n + of the 
times the system is found in X + , the estimate of the probability of X + is found 
by 



Simulation 221 

PM = ~ (7.5) 
It is well known from the frequency concept of probability that 
P+ (t) = p + (t) = U f 

Interval Frequency 

If in N realizations of the system state, the system visited X + n c times, the 
estimate ofF(0, t) is 

/• + ((U) = ^ (7.6) 

Fractional Duration 

The fractional duration in X + is estimated by 

N 

2 in 

where u t is the time spent in X + in the ith realization. 

Steady State Probability of being in X + 

This can be calculated either from a number of realizations, allowing 
sufficient time for letting each realization to reach equilibrium state or it can be 
calculated as the fraction of time spent in X + in a very long realization, i.e. 

p. 

T 

where f + is the time spent in X + in interval (OX) when T is large. This latter 
approach is less time consuming as in the former approach considerable time is 
spent in reaching the equilibrium condition. 

Mean Cycle Time 

After the simulated system has reached an equilibrium, the mean cycle time 
can be estimated as follows 

n 

2 t, 



where t t is the time interval between the (/ — l)th and the zth encounter of X' 



222 System Reliability Modelling and Evaluation 



Equilibrium Conditions and Sample Size 

The most important reliability measures in repairable systems are obtained from 
the steady state or equilibrium conditions. In this case the system state 
probabilities are independent of the initial conditions and the time elapsed 
since the start of system. The system reaches a steady state condition when the 
system state probability distribution reaches a limiting equilibrium distribution. 
It should be remembered that steady state condition can only be approached 
but never exactly attained. 

In determining the mean cycle time, the data obtained during the initial 
period of simulated system operation should be excluded. It is difficult to 
know how long the system should be operated before taking observations. It 
could, however, be roughly estimated by having a few trial runs and estimating 
the probability distributions at various points in time. It should be noted that 
even when steady state measures are the basis of evaluating alternative designs, 
the same initial conditions should preferably be used. Simulation or Monte 
Carlo techniques are used to obtain a numerical estimate of the inherent system 
reliability parameters. As the sample size increases the estimated value approaches 
the estimand. The question which now arises is how big should the sample size 
be? It is not adequate to simulate the system for an arbitrary long time and then 
simply assume that the results are sufficiently precise. 

Reliability measures obtained from each simulated sample run are generally 
different and one object is to determine the mean value of the measure. A 
simple case is when these observations of the measure are statistically independent 
and have a common normal distribution. This case is considered and then 
extended to more general situations. The confidence interval for the mean m of 
the normal distribution having variance o 2 can be obtained as follows. Let X 
be the sample mean obtained from a random sample of size n. X is then also a 
random variable and it can be shown that 

(X-m)s/n 



has a t distribution with (n — 1) degrees of freedom. The value of v is 

i-i n-l 

where X f is the z'th observation . Now 



i (X — m)>Jn „ , 

*a/2 " 



l'-a 



where t^i is the 100 a/2 percent point of the distribution with (n — 1) degrees 
of freedom and can be found from Tables of the t distribution. The above 



Simulation 223 

expression can be rearranged as 

p|l"f^ 1 1 ^<m<l + ^- 1 -~J = l- a 

Therefore with confidence 1 — a, the upper and lower bounds of m are 

- _i v 
upper bound — X + f<^ 2 ~~^= 

- v 
lower bound = X — tZ 2 

The required sample size can be predicted by obtaining an estimate of the 
standard deviation of the observations either from pilot runs or from initial 
observations. As can be seen from above, the interval between the upper and . 
lower bounds can be made as narrow as desired by making the sample size 
sufficiently large. 

Two assumptions made above are that the observations of the measure are 
statistically independent and have a common normal distribution. This, 
however, is not true in general and the following methods may be employed 
to realize these assumptions in practice. 

The first method is to have a number of independent simulated runs. The 
mean measure obtained from each run can be used as an independent 
observation. These can be assumed to be normally distributed in accordance 
with the Central Limit Theorem. The confidence interval therefore can be 
found by the procedure described. 

When steady state measures are being estimated, the above procedure can 
be quite wasteful since in each simulation run, the initial period is unproductive. 
The alternative is to use a single simulated run and to divide the steady state 
period into equal long intervals. The value derived from each interval can be 
used as an observation. It should be noted that these observations are not 
completely independent but by making the intervals sufficiently long, the 
correlation can be decreased. 



Variance Reducing Techniques 

It has been pointed out that the precision of sample estimates can be increased 
by making the sample size large enough. Increasing the precision is equivalent 
to decreasing the variance of the sample estimates. The simple method of 
repeated runs (or making a single run very large and dividing it into equal 
intervals), treating the measures obtained from each sample as independent 
■ sample values until the variance has been reduced to the desired level is 
usually quite time-consuming. Special techniques for reducing variance have 



224 System Reliability Modelling and Evaluation 

been devised. These techniques, extract as much and as precise information 
as possible from the amount of simulation that can be economically executed. 
These are three generally used techniques: 

1. stratification 

2. control variates 

3. antithetic variables 

The reader is referred to References 4-5 for discussion of these methods. 



References 

1. D.R. Cox and H.D. Miller, The Theory of Stochastic Processes, Methuen, 
London (1965). 

2. CD. Flagle, W.H. Huggins and R.H. Roy, Operations Research and Systems 
Engineering, Johns Hopkins Press, Baltimore (1960). 

3. F.S. Hillier and G.J. Lieberman, Introduction To Operations Research, 
Holden-Day (1970). 

4. J.M. Hammersley and D.C. Handscomb, Monte Carlo Me thods, Methuen, 
London (1964). 

5. K.D. Tocher, The Art of Simulation, English Universities Press, London 
(1963). 

6. C.F. DeSieno and L.L. Stine, A Probability Method for Determining the 
Reliability of Electric Power Systems, Trans. AIEE (February 1964). 



CHAPTER 8 

Conclusions 



We began this book by describing the place of modelling and evaluation in the 
total setting of system reliability planning. This is an important phase of a 
reliability program as the reliability model integrates the information available 
on the various components to provide overall system indices. Several reliability 
measures have been defined and techniques described for their calculation. The 
choice of a particular index depends on the penalty factors associated with 
system failures. If the penalty depends on the total duration of failure, 
availability is a relevant measure. If, however, the number of failures is more 
important, frequency is a more appropriate index. Ideally a reliability measure 
should be sensitive to all the factors which affect reliability. In practice, however, 
some measures are more sensitive to a certain parameter than the others, A 
single measure is, therefore, not likely to give a complete picture of system 
reliability and the use of more than one reliability index is preferable. 

The central concept running throughout the book is the frequency balancing 
approach. This method gives the same linear, differential or integro— differential 
equations as would be obtained by conventional means. The chief advantage of 
this approach is the simplicity with which frequency, cycle time and mean 
duration indices can be calculated. Another important feature of this technique 
is the equivalent transition rate. This concept has been used to derive conditions 
of mergeability which set the mathematical limits for model reduction. Correct 
appreciation of the conditions of mergeability is important for model reduction. 
Many gross mistakes have been committed in the literature because of a lack of 
such an understanding. 

Reliability evaluation of large and complex systems can sometimes be quite 
difficult. The difficulties are more computational than conceptual. When the 
system is composed of independent elements, model reduction using the 
concept of equivalent transfer rates proves quite useful. When dependent 
failures are encountered, model reduction becomes limited. Two useful 
techniques under these circumstances are truncation and sequential truncation. 

In reliability models, assumptions are made about the probability distributions 
of times to failure and times to restore. It goes without saying that model 
formulation approaches reality only to the extent that the assumed distributions 
approach the actual ones. The bulk of this book is devoted to Markovian models. 
These models assume exponential distributions for the mean up and down times. 
The techniques of dealing with non-Markovian models have also been described 



226 System Reliability Modelling and Evaluation 

in detail. Analytical methods involving integro— differential equations are useful 
but the solution can be quite intractable for fairly complex systems. The device 
of stages transforms a non-Markovian model into a Markovian model and a 
solution is therefore possible. The only disadvantage of this method is the 
multiplication of the number of states. 

Mathematical modelling provides a compact and efficient means of obtaining 
system reliability measures. The mathematical model abstracts the essence of 
the physical system and provides a deeper insight into the cause and effect 
relationships within the system. All reliability problems, however, cannot be 
solved analytically. Simulation is a more flexible approach and can take into 
account many factors which cannot be handled by mathematical models. 
The approach, however, is time-consuming and does not provide the same 
insight into the system. Wherever possible, an analytical approach is definitely 
superior to using a simulation technique. 

The emphasis throughout the book is not on the development of particular 
reliability models but on deepening the insight into the theory, philosophy and 
the techniques of system reliability modelling. The purpose is to enhance the 
reader's capability for reliability modelling and evaluation. 

We conclude this book by repeating the warning given in the introduction. 
After making a number of experiments with the reliability model, it is possible 
to develop a confidence in the results which may not prove justified by a 
closer examination of the data. The reliability model transforms data into 
reliability measures and, given a correct model, the validity of these measures 
depends on the validity of the data used. 



CHAPTER 9 

Appendices 
Appendix I 

Solution of simultaneous linear equations 

Reliability evaluation often calls for the solution of a set of simultaneous linear 
equations of the form 

AX = B 

where A is a nonsingular coefficient matrix and X and B are column vectors. 
The results could be obtained using Cramer's rule which proceeds by evaluating 
determinants and expanding by minors. The system of n equations in n 
unknowns takes on the order of n \ multiplications. If n = 25 and each multipli- 
cation takes 10" 6 seconds, the computation time required would be several 
million years. Several numerical methods have been devised to solve linear 
equations. This book covers only the basic principles of Gauss-Jordan method 
of elimination. Readers interested in further details should refer to one of the 
many excellent books available on numerical methods. 

The basic procedure of this method is quite simple. The first variable from 
all but one of the equations is eliminated by adding an appropriate multiple of 
this equation to each of the others. The second variable is then eliminated from 
another equation in the same manner. The procedure continues until each of 
the equations has only one variable left. The result can then be read directly. 
This can be illustrated by solving the following set of linear equations. 



x x +x 2 + 3x 3 =4 (1) 

x x -x 2 + 4x 3 =5 (2) 

2x l -x 2 + 3.\- 3 = 6 (3) 

Step 1 Remove Xi from (2) and (3) by multiplying Equation (1) by (— 1) 
and (— 2) respectively and adding 

xi + x 2 + 3x 3 = 4 (4) 

-2x 2 + x 3 = 1 (5) 

—3x2-3*3 = -2 (6) 



228 System Reliability Modelling and Evaluation 

Step 2 Remove x 2 from (4) and (6) by multiplying (5) by (i) and (- §) 
respectively and adding to (4) and (6) respectively 

*i + lx 2 = \ 

— 2x 2 + x 3 ■ = 1 
"iv 3 = - i 

Step 5 Remove x 3 from (7) and (8) by multiplying (9) by Q ) and (§ ) and 
adding to (7) and (8) respectively 



(7) 
(8) 
(9) 



2x, 



Step 4 The results can now be read 

v - 16 

y. — . - 1 

A'3 = $ 

In the computer, the operations are done in matrix form. The initial step is 
to form an augmented array 

[A \ B) 

This in our present example is of the form 

1 1 3 4 

1 -14 5 

2 -1 3 6 



Row 1 is called the pivot row and the first element of this row is called the 
pivot element. The procedure consists in setting the elements above and 
below the pivot elements (diagonal elements) to zero. 

First step. Rl (row 1) is the pivot row, a n is the pivot element. Divide the 
pivot row by the pivot element. Set elements below the pivot element to 
zero by 

R' 2 = R 2 -a 2l Rl 



^3 ~ -^3 #31-^1 



Appendix! 229 



1 


1 


3 


4 


0 


. -2 


1 


1 


0 


-3 


-3 


-2 



The prime indicates the resulting row. 

Second Step. R 2 is the pivot row and a 22 is the pivot element. Normalize the 
pivot row by dividing by the pivot element. 

1 13 4 

9 1 -\ -4 
0 - 3 - 3 - 2 

Set the elements above and below the pivot element by 



R'l = 


Rl 


-a i2 R2 




R'3 = 


R3 


~a 32 R2 




1 


0 


I 


9 
2 


0 


1 


i 

2 


1 


0 


0 


9 

— 5 


7 
2 



Third Step. R 3 is the pivot row and a 33 is the pivot element. Normalize the pivot 
row by dividing by the pivot element. 



1 


0 


7 
2 


9 
2 


0 


1 


_ 1 


7 

— 2 


0 


0 


1 


i 



Set the elements above the pivot row by 



R'l = 


Rl 


-a 13 R3 




R'2 = 


R2 


-a 23 R3 




1 


0 


0 


16 


0 


1 


0 - 


h 


0 


0 


1 





230 System Reliability Modelling and Evaluation 

The last column is the solution vector. It should be noted that special 
pivoting techniques are available for avoiding rounding off errors. For details 
of further refinements, books on numerical analysis should be consulted. 



Appendix II 



Shape of the hazard rate function of two series stage combinations in parallel 

The expression of the probability density function of this stage combination 
as given in Chapter 6 is 



The survivor function is 



(2) 



and the hazard rate function is 

(i) At the origin 

0(0)= /(0) 

since 

5(0) =1 

The following conclusions can be drawn regarding the magnitude of the hazard 
rate at the origin 

0 iftfi >1, a 2 > 1 

coiPi if a x =1, a 2 > 1 

co 2 P2 if a 1 > 1, a 2 — 1 

>iPi + W2P2 if«i =^2 = 1 

(ii) The derivative of 0(x) at the origin 
f\x)S{x) + {f(x)f 



0(0) = 



'(0) = 



{S(x}} 2 

= Ao) + V(o)} 2 



Appendix II 231 



Now by the initial value theorem 
V(0) = Kmi[#)-/(0)] 

The expression for/(s) is 
Pi 



co 2 



Pi 



+ s j ~lp 2 + s 
The following conclusions can be drawn 
(a) a x = a 2 = 1 
Then 

#( S )-/(0) = +^l-( UlPl + W2 p 2 ) 

s + Pi s + p 2 

W1P1 W2P2 



Therefore 

f'(0) = lims 



Pi +s p 2 +5 



OJip.j 



P! +5 p 2 +S 

,2 



and 



0'(O) - 



<^lPl — ^2P2 

COiP? + ("iPi + CO2P2) 2 

= -w i co 2 (pi -p 2 ) 2 

This expression is always negative and therefore for this condition, the 
hazard rate is always initially decreasing. See curve 1 of Fig. 6.9. 

(b) #i = 1 and a 2 = 2 

?M ™ tOjp? / p 2 ^ 

sf(s)-f(0) - -— — + w 2 s — 

S' + Pl U + P 2 

Therefore 

AO) = lims{s/(0)-/(0)} = - co lP 2 + <o 2 pl 

and 

0'(O) = - G^p 2 + OJ 2 p| + (£OiPi) 2 
= ^(p2 -<0!P?) 

The sign will depend upon that of the quantity inside the parenthesis. This is 
negative for curve 2 in Fig. 6.9 and therefore the hazard rate is initially 
decreasing. 

(c) a t = 2 and a 2 = 1 
0'(O) = coi(p? -W2P2) 



232 System Reliability Modelling and Evaluation 

(d) # , = 2 and a 2 — 2 

0'(O) = /'(O) since /(O) - 0 
Therefore 

0'(O) = lim s 2 f(s) 

= co 1 p 2 + co 2 P2 

The hazard rate is therefore initially increasing in this case. See curve 4 in 
Fig. 6.9 

(e) a { > 2, a 2 > 2 
0'(O) - lim s 2 f(s) 

S -»oo 

= 0 

The hazard rate is, therefore, initially constant as can be verified from 
curve 5 of Fig. 6.9 

(f) a x = 1 and a 2 > 2 
s/(s)-/(0) = — +co 2 s — 

Pi + 5 V + P V 

lims{sf(s)-f(0)} = - Wl p? 

s->». 

Therefore 

0'(O) = -co,p? +(w 1 p I ) 2 
= — CO1CO2P1 

(g) a, > 2 and « 2 = ' 

0'(O) — — CO1.W2P2 

In cases (f) and (g) the hazard rate is, therefore, initially decreasing as can 
seen from curve 3 of Fig. 6.9. 

(Ill) 

0(x) as x 00 

(a) pi > p 2 

lim 0(x) — p 2 

(b) p 2 > Pi 

lim 0(x) = pi 



Appendix III 233 



(c) p, = p 2 

lim 0(x) = pi = p 2 

The limiting value of the hazard rate as x becomes large is therefore always 
the smaller of pi or p 2 . 
The three quantities 

i. 0(x) as x ~> 0 

ii. 0'(x) as x 0 

and 

iii. 0(x) as x -> 00 

are enough to get an approximate idea of the shape of the hazard rate. A 
knowledge of the behaviour of these quantities is helpful in making finer 
adjustments in the shape of the hazard rate. 



Appendix III 



Hazard rate shape of series stages in series with a distinctive stage 

(i) (b(x) at x = 0 

It can be seen by examining the ratio of Equations (6.69) and (6.70) that 
0(0) ~ 0 

(ii) <p'(x) at x = 0 

As in Appendix H 

0'(O) = lims[ S /(s)-/(0)] +{/(0)} 2 

= lim s 2 f(s) since /(O) = 0 

Now the Laplace transform of Equation (6.69) is 

Therefore 



\p+sj pi + 



pp ! if <z ■ = 1 

0'(O) = 

0 if a > 1 



234 System Reliability Modelling and Evaluation 
(iii) <p(x) as x -+ 00 

It can be proved by examining the ratio of Equation (6.69) and (6.70) that 



(Pi if 
[p if 



if Pi <P 

lim 0(xr) 



The final value is therefore the lesser of the two. 



Appendix IV 

Series stages in series with two parallel stages 

Derivation of expressions 

Designating the state whose duration is represented by this combination as 0 
and the state of not being in it as A the state transition diagram is shown in 
Fig. IV. 1 

1 — 1 . 1 

! 0 ! ! 



1 




Fig. I V.I State transition diagram to derive expressions for the 
stage combination 

Assuming Pi(0) = 1 -0, the time spent in state 0 is identical with the time 
since the origin and as explained in Chapter 6. 

fo(x) = Pip2l(*) + PlPl2(x) 

a 

So(x) = £ Pi(x)+p 21 (x)+p 22 (x) 

i=l 

and 



Appendix IV 



The differential equations for this system are 
Pi(t) = ~PPi(t) 
■ Piif) = p(Pi(t)-p 2 (t)) 

Pn(f) = P{Pn-l(t)-p n {t)) 
Pa(t) = P(P«-l(0-Pa(0) 

p 2 i(t) = o> l pp a (t)-p 2l {t)p l 

' p' 22 (t) ~ 0} 2 PP a (t) ~p 22 (f)p 2 

Taking the Laplace transform of these differential equations 
1 

p + s 
P 



Pl(*) 



Pi(s) 
Similarly 

Pn(s) 
Pa^) 



(p + s) 2 

P n ~ l 

(p+sf 

P V 1 



and 

p 22 (s) = OJ 2 



p 21 (s) = Wi 

\p + s S + Pi 



\P+sJ s + p: 



The inverse of Expression (2) is 

, , (pxy- 1 _ px 

For taking the inverse of Expression (4), it can be expanded into partial 
fractions 

p 21 (s) = w t - — — ■ = —~ + 



p + s S + pi p + s (p+s) 



(p + sf Pl +S 
The numerators can be determined as 



236 System Reliability Modelling and Evaluation 

P 



M = (Pi + s)p 2 i(s)\ s =. Pi ".= coi 



p-Pi 
1 



and 



Let 



N a = (p + s) a p 21 (s)\ s= _ p .= u lP a - 

P -Pi 



■7V„ 



m\ds r 



(p+s)°P2i(s)u P = c-irw 



(Pl-P) n 



Then 



^ = (-ly-'W 



(Pi-P)" 

Substituting these values into the expansion of p 2 i(s) and inverting 
P 



Pll(x) = CO! 



P "Pi 



"iP 7 -r^e 

(P-Pi) 



^iP 



(p-Pi) G - i+1 (/-l)! 

1 r a-1 
P-Pi 1)! 

A similar expression can be now easily derived from p 2 2( x ) and finally 



a (ox\ n ~ x 
S 0 (x) = Po (x) = £ f-l— t6 -px + 



„■=! \P-Pl 

{(p-pOx}"- 1 



GJ 2 



and 



-Pi* — &~px 



-Pa/ 



-p 2 x _ 



/o(*) = P2l(X)Pl + P22<»P2 
P 



<^lPl 



+ W 2 p 2 



P - Pi 
P 



-p,x _ a ~PX 



P ~P2 



f |(P-P2>}"- 1 



- {(p -pOx}"" 1 



(7) 



The Expression (8) could alternatively be derived using the fact that the 
Laplace transform of the sum of independent random variables is the product 
of their Laplace transforms. The probability density function of the random 
variable X representing the state 0 is the sum of the random variables X lf X 2 



(8) 



■ Appendix IV 237 

denoting the states 1 to a and 21 and 22. The probability density function of 
stages 1 to a is 

MA) (a 1)! 

and 

m - (-2 

\P+s 



The probability density function of the parallel stages is 

f 2 (x) - co l p 1 e- p * x + co 2 p 2 e~ p i x 
and the corresponding Laplace is 

J , Pi > P2 

J 2 (S) = CO 1 — — + L0 2 



Pi + s P 2 + S 

Now 

/o(*) - A(s)'jz(s) 

Substituting the values and inverting, / 0 (jc) can be obtained. This is left as an 
exercise for the reader. 

The mean duration ~ The sum of the means 

— a _|_ w l _|_ c ° 2 
P Pi P2 



Variance 

The two random variables X^ and X 2 are independent and therefore the 
variance of X is the sum of the variances of Z x and X 2 . 

Variance + + • <*V 

P "1 P2 \Pl P2 

^.+ 21^ + ^' 

P2l \Pl P2 



Of The Hazard Rate 

<» 

(id) 0'(O) ■=. lim s 2 /(s) 



238 System Reliability Modelling and Evaluation 
Now 



\p + s) \ px+s 



P2 



0'(O) = 



P2 + s 

if a = 1 
if fl.> 1 



(iii) 0(x) as * 00 

(a) p = min(p,Pi,p 2 ), P ^ Pi ^ P2 



lim 0(jc) = lim 



S(jc)/jc a -V 



P-J 


(p~Pir f 


+ f ^ V 

+ w 2 p 2 

\P - 02/ 


(a-1)! J 




p T 

(8-1)! \P-Pi/ 


[ (P-Plf- 1 ' 

(a-iy. 


\p-pij 


(a-\)\ 



PP\Pl ~P 2 (^1P1 + ^2P2> 
P1P2 — P(WlPl + W2P2) 



(b) Pi = min(p,p i5 p 2 ), p # Pi ^ P 2 



lim 0(x) 



5(x)/e" 



<*>iPi 



— y 

p-pij 



p-pi 



. ■ ~ pi 

(c) p 2 = min(p,pi,p 2 ), P ^ Pi ^ P 2 
lim 0(x) = P2 

(d) p = p x - p 2 , the combination becomes a Special Erlangian distribution 
having 

lim 0(x) = p 

(e) pi = p 2 < p then 



Appendix V 239 



W1P1 



lim 0(x) = 

ip r , / p 
^1 1 +^2 



(— -) +CO2P2.I— — ) 

\P-Pl/ \P"P2/ L 

- y 



Pi/ \P~-P2, 

= COiPi + CO2P2 
= Pi = P2 
■(f) p = Pi < P 2 

Then 

lim 0(x) = p 

It can be concluded from above that 
lim 0(jc) = min (p, p!,p 2 ) 



Appendix V 

Moments of Stage Combinations 

Series of Identical Stages 

The Laplace transform of the probability density function is 




Differentiating successively and substituting s — 0 

r (r) (o) = (-o r l - n t 
p fe=i 

The rth moment is therefore 

m r = ~ n (fl + fc-1) 
P fe = i ■ 

Two Sen'es Stages wi Parallel 

The rth moment in this case can be written as 



240 System Reliability Modelling and Evaluation 

™ r ^n(« 1 u-i)+^n.(^fe--i) 

Pi fe = l P2 fe=l 

Series stages in series with a distinctive stage 
That is 

(s + pf(s + , h )f(s) = Pl p a 

Differentiating both sides 

(s.+ p)(s' + P,)As) + {«(s + Pi) + (* + P)}/(*) = 0 (1) 

Differentiating once again 

(s + p)(s + p 1 )f"( S )+[2(s + p) + (a+lXs + p l )]f'(s) 

+ (a + \)f(s) = 0 (2) 

Differentiating r times 

(s + p)(s + p 1 )f ir) (s) + {r(s + p) + (a+r- l)(s + Pi)}/^ 1 ^) 

+ '0r-I)(a+r-l)/ (r - 2) (s) = 0 (3) 

Putting 

s = 0 
and . 

/ (,,) (0) = (-1 )'>/,,. 

From Equation (1) 

pPxtn-L = api + p for r' = 1 
From Equation (2) 

PPi^2 — {2p + {a + l)pi}m 1 =— for r = 2 
From Equation (3) 

PPim r - {rp + (a + r- l)pi}m r _! + (r- l)(a + r- l)m r _ 2 ..= 0 
for r > 2 
In the matrix form 

[A][m] = [B] (4) 

where 

A.~ [a t j] is the coefficient matrix such that 



Appendix V 241 



and 



a t j = 0 if }>i or /■</ — 2 

ay = ppi if /'=/ 

= - [i p +( - a+ i-i) Pl } if y = i - i 

a tf = (i - 1)(a + z - 1) if / = z - 2. 



p +ap! - 
-(a + 1) 
0 



The r moments can be found by solving the set of linear Equations (4). 



Series stages in series with two parallel stages 

This is equivalent to two 'series stages in series with a distinctive stage' in 
parallel. The rth moment for the whole combination is obtained by 

[m] = + [m 2 ] 
[mi ] and [m 2 ] are found by equations, 



[AyWm,] = [B\] 

and 

[A 2 ][m 2 ] = [B 2 ] 



242 System Reliability Modelling and Evaluation 
In this case 



(p +ap 1 )oj i 
0 



(p +ap 2 )oj 2 
— (a + 1)cj 2 
0 



Appendix VI 

Calculation of the Jacobian Matrix for two series stages in parallel 

Assuming a t and a 2 > the remaining three parameters p t , p 2 and co t can be 
calculated by matching the first three moments. Therefore 

%0 = .[PloP20Wlo] f , 0 =-[0102 03]' 

The elements of the /th column of the Jacobian matrix can be obtained 
by differentiating 

0; = -J~ 11 (*i + n («2 +*r-l)-Afy 



Appendix VI 243 



That is 
30; 
9pi 

<% 

9P2 

and 



itoi P 



yrr ITL.fei+*-i) 
pi fe=i 

n (« a +*-D 

pi fe-i 



i fc-i p 2 ^=1 



Series Stages m Series with Two Parallel Stages 

Assuming the number of stages to be a, the remaining four parameters, p x , p 2 , 
p and cji can be calculated by matching the first four moments. The vector 

0 = m — M 

where m and Mare vectors of the stage model moments and the moments of 
the distribution to be approximated. Since m ~ m a + m b> the Jacobian of 
0 at X 0 becomes 

0'(Z O ) = m' a (X 0 ) + m' b (X 0 ) 
m' a (X 0 ) and m' b (X 0 ) can be obtained by differentiating and solving 
A 1 m a = B a and A 2 m b = B b 



This solution can be obtained using Gauss elimination. 



Index 



Absorbing state, 44 

Age specific failure rate, 11-12 

Algorithm for minimal cut sets, 127, 

129 
Allocation, 4 
Alternate designs, 6 
Alternating renewal process, 80 
Analytical modelling, 5 
Antithetic variables, 224 
Approximation for non-Markovian 

model, 198 
Asynchronous method, 214 
Availability, 

steady state, 3, 221 

time specific, 2, 69, 220 
Average values, 87 

Block schematic diagram, 109 
Boole's inequality, 118,119 
Box and Muller method, 219 

Central limit theorem, 223 
Central moment, 16 
Change out time, 173, 179 
Chapman-Kolmogorov equation, 35, > 
49 

Characteristic function, 17-18 
Closed communicating class, 39 
Coefficient of excess, 17 
Coefficient of skewness, 17 
Complex transition rates, 207-1 0 
Co nd itio ns o f merge ab ility , 135-41 
Confidence interval, 222-3 
Connected sub network, 1 16 
Continuous parameter Markov chains, 
48-61, 177 

dishonest process, 49 

equilibrium probability 
distribution, 53 

first passage time, 57 

forward equations, 51 

honest process, 49 



Kolmogorov differential equation, 
52 

mean cycle time, 61 

mean duration, 60 

mean time to first failure, 57, 59 

time homogenous, 49 

transient behaviour, 52 

transition rate, 51 
Continuous random variable, 8-9 
Continuous state, 170 
Control variates, 224 
Correlation coefficient, 1 5 
Countable set, 7 
Countably infinite, 7 
Covariance, 15 
Crammer's rule, 227 
Cutset, 116 

Cut set approach for frequency, 120-5 
Cut set manipulation of probabilities, 
119 

Cut set methods, 119 
Cyclic set, 39 

Data collection, 5 

Decomposition approach, 91, 106-7 

Denumerable set, 7 

Dependent failures, 91 

Device of stages, 179-195 

derivation of characteristics, 181 
determination of parameters, 180- 
91 

selection of stage combinations, 
179-80 
Diagonal matrix, 45 
Discrete random variable, 7-8 
Dishonest process, 49 

Eigenvalue, 41-3, 71 
Eigenvectors, 41-3, 71 
Equilibrium alternating renewal 

process, 80 
Equilibrium distribution, 39-40, 53 



246 System Reliability Modelling and Evaluation 



Equivalent transition rate, 87, 135 
Ergodic set, 39 

Estimation of reliability measures, 

220-21 
Events, 8 
Event space, 9 1 
Expectation, 13 
Exponential distribution, 22-5 

distribution of residual life time, 24 

random sampling, 208 

Factors of influence method, 4 
Failure modes and effects analysis, 

90, 109, 213 
Failure modes, effects and criticality 

analysis, 5, 90 
Fatigue failures, 3 1 
Final value theorem, 2 1 
Finite set, 7 

First passage time, 44-5, 57 
Fixed interval method, 214 
Fluctuating environment, 144-5 
Force of mortality, 11-12 
Forward equation, 51, 68 
Fractional duration, 2, 69, 71-2, 221 
Frequency 

interval, 70, 72 

steady state, 3, 74 

time specific, 66, 67 
Frequency balance, 96 
Frequency balance equation, 101 
Frequency balancing approach or 

technique, 48, 63-79, 225 
Frequency equilibrium, independent 

components, 78-9 
Functional diagram, 89 
Fundamental matrix, 45 

Gamma distribution, 31, 183 
Gamma function, 30 
Gauss elimination, 133 
Gauss- Jordan method, 133, 227 

Hazard function, 11-12 
Honest process, 49 

Incomplete gamma distribution, 32 
Independent components, 206-7 
Initial moment, 1 6 
Initial value theorem, 2 1 
Interconnected power systems, 106 



Interstate transition rate, 63-6 
Interval frequency, 2, 70, 72, 221 
Inverse of probability distribution 
method, 216-17 

Jacobian matrix, 194, 242-3 

Key component, 106 

Kolmogorov differential equation, 52 

Laplace transform, 20 

Large systems, problem areas, 132-4 

Law of large numbers, 13 

Logic diagram, 109 

Lognormal distribution, 28-9, 195 

Maintainable systems, 89 
Markov chains, 36-48, 177 

absorbing state, 44 

closed communicating class, 39 

cyclic or periodic, 39 

diagonal matrix, 45 

equilibrium distribution, 39-40 

ergodic set, 39 

first passage time, 44-5 

fundamental matrix, 45 

mean time to first failure, 44, 57, 
59 

steady state probabilities, 40 

stochastic matrix, 37 

time specific behaviour, 4 1 

transient set, 39 

transition matrix, 37 
Mathematical model, 211,212 
Mean cycle time, 3, 61, 75, 84, 221 
Mean duration, 3, 60, 75 
Mean first passage time, 3 
Mean passage time, 3 
Mergeability, 88 

conditions of, 135-41 
Method of supplementary variables, 
165-76 

Mixture of Erlangian distributions, 
187 

hazard function of, 230-3 

moments of, 239-40 
Modified renewal process, 85 
Moments, 16-17 

Moment generating function, 18-20 
Moments of stage combinations, 239- 
41 



Index 247 



Monte Carlo technique, 222 
Multiplicative congruential method, 
212 

Network method (approach), 91, 109- 
29 

Network reduction procedure, 110-1 1 
Newton-Raphson method, 192, 194-5 
Next event method, 214 
Non-maintained systems, 89 
Non-Markovian systems, 164 
Non-singular coefficient matrix, 227 
Normal distribution, 25-28 
random sampling of, 219-20 

Operational methods, 19-22 

Parallel systems, 103-4 

m\n parallel, 104-5 

parallel redundant, 103 
Path, 116 

Physical diagram, 1 09 
Poisson distribution, 24 
Possibility space, 7 
Probability density function, 10-1 1 
Probability distributions, 22-33 
Probability generating function, 20 
Probability laws, 9-12 
Probability mass function, 1 0 
Proportionate allocation technique, 
215-16 

Pseudo random numbers, 213 
Pumping system, 7 

Random number generation, 212 
Random sampling, 214-20 
Random variables, 8-9 
Reliability, 2 

Reliability block diagram, 1 09 
Reliability, planning, 1 
Reliability program, 1 
Reliability program elements, 1 
Reliability report, 6 
Renewal density, 84-5 
Renewal theory, 80 
Residual life time, 8 1 

Sample space or sample description 

space, 7 
Semi-Markov process, 176-9 
Sensitivity analysis, 211 
Sequential truncation, 154, 158 



Series stages in series with distinctive 
stage, 189-91 

hazard function of, 232-4 

moments of, 240-1 
Series stages in series with two parallel 
stages, 191, 234-9 

moments of, 241-2 
Series system, 97-102 

dependent case, 99-100 

independent case, 97-8 

with spare, 100-102 
Similar familiar technique, 4 
Simple path, 116 
Simpson's rule, 1 99 
Simulation, 5, 211 
Simulation model, 213 
Simultaneous linear equations, 227-9 
Skew coefficient, 17 
Special distributions, 22-33 
Special Erlangian distribution, 32, 183 

moments of, 239 

random sampling of, 218-19 
Stages in parallel, 185-8 
Stages in series, 182-5 
Standard deviation, 14 
Standard normal distribution, 26 
State space, 91 
State space approach, 91 
State space truncation, 151-3 
State transition diagram, 91 
Stochastic matrix, 37 
Stochastic processes, 33-6 

indexing parameter, 33 

parameter space,. 33 

probability distribution, 35-6 

realization of, 34 
Stratification, 224 
Survivor function, 1 1 
Synchronous timing method, 214 
System, 89 

description of, 89-90 
System reliability, 2 

quantitative measures, 2 

reference indices, 2 

/-distribution, 222 
Testing, 5 
Tie set, 117 

Tie set manipulation of probabilities, 

117-18 
Tie set methods, 116 



248 System Reliability Modelling and Evaluation 



Time homogenous Markov process, 
49 

Time specific availability, 2 

Time specific behaviour of Markov 

chains, 41 
Time specific frequency, 66-7 
Timing controls, 214 
Transform methods, 17-22 
Transient behaviour, 52 
Transient set, 39 
Transition matrix, 37 
Transition rates, 51, 63-6 
Triangular distribution function, 217 
Truncated normal distribution, 27 
Two series stages in parallel, 187 



hazard function of, 230-3 
moments of, 239-40 
Two state Markov process, 54 

Uncountable or uncountably infinite, 
8 

Uniform probability density function, 
217 

Variance, 14 

Variance reducing techniques, 223-4 
Vertex state, 120 

Waiting time, 177 
Weibull distribution, 30-1 



